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DETERMINING THE FUNCTIONS AND INTERACTIONS OF 
PROTEINS BY COMPARATIVE ANALYSIS 

Related Applications 

The present application is a continuation-in-part application ("CIP") of Patent 
Convention Treaty (PCT) International Application Serial No: PCT/USO0/02246, filed in the 
U.S. receiving office on January 28, 2000, and this application claims the benefit of priority 
under 35 U.S.C. § 1 19(e) of U.S: Provisional Application Nos. 60/165,124, and 60/165,086, 
both filed November 12, 1999, and U.S. Provisional Application No. 60/179,531, filed February 
1, 2000. International Application Serial No: PCT/US00/02246 claims the benefit of priority 
under 35 U.S.C. § 1 19(e) of U.S. Provisional Application Serial No. 60/1 17,844, filed January 
29, 1999, U.S. Provisional Application Serial No. 60/1 18,206, filed February 1, 1999, U.S. 
Provisional Application Serial No. 60/126,593, filed March 26, 1999, U.S. Provisional 
Applications Serial No. 60/134,093, filed May 14, 1999, and U.S. Provisional Application 
Serial No. 60/134,092, filed May 14, 1999. Each of the aforementioned applications is 
explicitly incorporated herein by reference in their entirety and for all purposes. 

TECHNICAL FIELD 

This invention generally relates to genetics^nd microbiology. The invention 
provides novel methods to identify the function of and relationships between nucleic acid and 
protein sequences. The method is particularly useful for finding the identifying genes and 
polypeptides having potential therapeutic relevance in organisms, e.g., microorganisms, such 
as Mycobacterium tuberculosis. The invention also provides Mycobacterium tuberculosis 
genes and polypeptides found by these methods. These genes and polypeptides are useful as 
potential drug targets. 

BACKGROUND 

The determination of the functions of and relationships between nucleic acid 
and protein sequences has traditionally relied on either the study of homology and sequence 
identity with genes and proteins of known function or, in the absence of informative 
homology, laborious experimental work. The availability of many complete genome 
sequences has made it possible to develop new strategies for computational determination of 
protein functions. Several methods have been developed which can predict the general 
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function of proteins by analyzing their functional relationships rather than sequence 
similarity. Generally, two proteins can be considered functionally related when they form 
part of the same biochemical pathway or biological process. For example, although malate 
dehydrogenase is not homologous to pyruvate carboxylase, and the two enzymes do not 
5 catalyze the same reaction, they are functionally related because they both catalyze steps of a 
common biochemical pathway, namely the tricarboxylic acid cycle. 

New methods that can establish such functional relationships could provide 
valuable information on the functions of uncharacterized nucleic acid and protein sequences. 

The disease tuberculosis, caused Mycobacterium tuberculosis (MTB) is one 
10 of the world's leading killers. The World Health Organization estimates that 30 million deaths 
from pulmonary tuberculosis will occur during this decade. Alarming reports on the 
emergence of drug-resistant strains of this bacterium underscore the importance of the search 
for new therapeutic agents. Identifying the function of every protein produced by MTB will 
provide researchers with promising new targets for anti-tuberculosis drug design. 

15 SUMMARY 

The invention provides novel methods for characterizing the function of 
nucleic acids and polypeptides. The invention provides a novel method for identifying a 
nucleic acid or a polypeptide sequence that may be a target for a drug. The invention provides 
a novel method for identifying a nucleic acid or a polypeptide sequence that may be essential 

20 for the growth or viability of an organism. The characterization is based on use of methods of 
the invention comprising algorithms that can identify functional relationships between diverse 
sets of non-homologous nucleic acid and polypeptide sequences. Characterization of nucleic 
acid and protein sequences can be the basis for the development of compositions that can 
interact with those nucleic acids and polypeptides. For example, such characterization can 

25 provide a basis for screening methods. Such characterization may allow use of these 

sequences as targets for drug discovery. Discovery of such compositions can provide the 
basis for the design of novel drugs, particularly if the characterized sequences are derived 
from a pathogen. 

The invention provides a method for identifying a nucleic acid or a 

30 polypeptide sequence that may be a target for a drug comprising the following steps: (a) 
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providing a first nucleic acid or a polypeptide sequence that is known to be a drug target; (b) 
providing at least one algorithm selected from the group consisting of a "domain fusion" 
method, a "phylogenetic profile" method and a "physiologic linkage" method, wherein the 
algorithm is capable analyzing a functional relationship between nucleic acid or polypeptide 
sequences; and, (c) comparing the first nucleic acid or the polypeptide drug target sequence 
to a plurality of sequences using at least one of the algorithms as set forth in step (b) to 
identify a second sequence that has a functional relationship to the first sequence, thereby 
identifying a nucleic acid or a polypeptide sequence that may be a target for a drug. 

The invention provides a method for identifying a nucleic acid or a 
polypeptide sequence that may be essential for the growth or viability of an organism 
comprising the following steps: (a) providing a first nucleic acid or a polypeptide sequence 
that is known to be essential for the growth or viability of an organism; (b) providing at least 
one algorithm capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences selected from the group consisting of a "domain fusion" method, a 
"phylogenetic profile" method and a "physiologic linkage" method; and, (c) comparing the 
first nucleic acid or the polypeptide sequence to a plurality of sequences using at least one of 
the algorithms as set forth in step (b) to identify a second sequence that has a functional 
relationship to the first sequence, thereby identifying a nucleic acid or a polypeptide 
sequence that may be essential for the growth or viability of an organism. 

In one aspect of the methods of the invention, the drug is an anti-microbial 
drug. In another aspect, the first nucleic acid or a polypeptide sequence is derived from a 
pathogen. The pathogen can be a microorganism, such as Mycobacterium tuberculosis 
(MTB). 

The plurality of sequences used to identify a second sequence can comprise a 
database of the gene sequences of an entire genome of an organism. The plurality of 
sequences used to identify a second sequence can comprise a database of the gene sequences 
derived from a pathogen. 

In one aspect of the methods of the invention, the "phylogenetic profile" 
method algorithm comprises (a) obtaining data, comprising a list of proteins from at least two 
genomes; (b) comparing the list of proteins to form a protein phylogenetic profile for each 
protein, wherein the protein phylogenetic profile indicates the presence or absence of a 
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protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and (c) grouping the list of proteins based on similar profiles, 
wherein proteins with similar profiles are indicated to have a functional relationship. The 
phylogenetic profile can be in the form of a vector, matrix or phylogenetic tree. The 
5 "phylogenetic profile" method can further comprise determining the significance of 
homology between the proteins by computing a probability (p) value threshold. The 
probability can be set with respect to the value 1/NM, based on the total number of sequence 
comparisons that are to be performed, wherein N is the number of proteins in the first 
organism's genome and M in all other genomes. The presence or absence of a protein 

10 belonging to a particular protein family in each of the at least two genomes can be 

determined by calculating an evolutionary distance. The evolutionary distance can be 
calculated by: (a) aligning two sequences from the list of proteins; (b) determining an 
evolution probability process by constructing a conditional probability matrix: p(aa— >aa'), 
where aa and aa' are any amino acids, said conditional probability matrix being constructed 

15 by converting an amino acid substitution matrix from a log odds matrix to said conditional 
probability matrix; (c) accounting for an observed alignment of the constructed conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 
during the alignment of the two sequences, represented by P(p)=]^[ p(aa n -» aa\) ■ and, (d) 

n 

determining an evolutionary distance a from powers equation p'=p a (aa— ►aa'), maximizing 
20 for P. The conditional probability matrix can be defined by a Markov process with 

substitution rates, over a fixed time interval. The conversion from an amino acid substitution 
matrix to a conditional probability matrix can be represented by: 

BLOSUM62ij 
Ps(i^j)=pOV A j ' 

where BLOSUM62 is an amino acid substitution matrix, and P(i->j) is the 
25 probability that amino acid i is replaced by amino acid j through point mutations according to 
BLOSUM62 scores. In one aspect, the Pf s are the abundances of amino acid j and are 
computed by solving a plurality of linear equations given by the normalization condition that: 

XJM/->y)=i. 
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In alternative aspects of the methods of the invention, the "physiologic 
linkage" method algorithm identifies proteins and nucleic acids that participate in a common 
functional pathway; identifies proteins and nucleic acids that participate in the synthesis of a 

5 common structural complex; and, identifies proteins and nucleic acids that participate in a 
common metabolic pathway. 

In one aspect of the invention, the "domain fusion" method algorithm 
comprises (a) aligning a first primary amino acid sequence of multiple distinct non- 
homologous polypeptides to second primary amino acid sequence of a plurality of proteins; 

10 and, (b) for any alignment found between the first primary amino acid sequences of all of 
such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. The aligning can be performed by an algorithm selected 

15 from the group consisting of a Smith- Waterman algorithm, Needleman-Wunsch algorithm, a 
BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. The multiple distinct 
non-homologous polypeptides can be obtained by translating a nucleic acid sequence from a 
genome database. The plurality of proteins can have a known function. At least one of the 
multiple distinct non-homologous polypeptides can have a known function. At least one of 

20 the multiple distinct non-homologous polypeptides can have an unknown function. The 

alignment can be based on the degree of homology of the multiple distinct non-homologous 
polypeptides to the plurality of proteins. The "domain fusion" method can comprise 
determining the significance of the aligned and identified second primary amino acid 
sequence by computing a probability (p) value threshold. The probability threshold can be 

25 set with respect to the value 1/NM, based on the total number of sequence comparisons that 
are to be performed, wherein Wis the number of proteins in a first organism's genome and M 
in all other genomes. The "domain fusion" method can further comprising filtering excessive 
functional links between one first primary amino acid sequence of multiple distinct non- 
homologous polypeptides and an excessive number of other distinct non-homologous 

30 polypeptides for any alignment found between the first primary amino acid sequences of the 
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distinct non-homologous polypeptides and at least one of the second primary amino acid 
sequences of the plurality of proteins. 

The invention provides a computer program product, stored on a computer- 
readable medium, for identifying a nucleic acid or a polypeptide sequence that may be a 
5 target for a drug, the computer program product comprising instructions for causing a 

computer system to be capable of: (a) inputting a first nucleic acid or a polypeptide sequence 
that is known to be a drug target; (b) accessing at least one algorithm capable analyzing a 
functional relationship between nucleic acid or polypeptide sequences selected from the 
group consisting of a "domain fusion" method, a "phylogenetic profile" method and a 

10 "physiologic linkage" method; and (c) comparing the first nucleic acid or the polypeptide 
drug target sequence to a plurality of sequences using at least one of the algorithms set forth 
in step (b) to identify a second sequence that has a functional relationship to the first 
sequence and generating an output identifying a nucleic acid or a polypeptide sequence that 
may be a target for a drug . 

15 The invention provides a computer program product, stored on a computer- 

readable medium, for identifying a nucleic acid or a polypeptide sequence that may be 
essential for the growth or viability of an organism, the computer program product 
comprising instructions for causing a computer system to be capable of: (a) providing a first 
nucleic acid or a polypeptide sequence that is known to be essential for the growth or 

20 viability of an organism; (b) accessing at least one algorithm capable analyzing a functional 
relationship between nucleic acid or polypeptide sequences selected from the group 
consisting of a "domain fusion" method, a "phylogenetic profile" method and a "physiologic 
linkage" method; and, (c) comparing the first nucleic acid or the polypeptide sequence to a 
plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 

25 second sequence that has a functional relationship to the first sequence and generating an 
output identifying a nucleic acid or a polypeptide sequence that may be essential for the 
growth or viability of an organism. 

The invention provides a computer system, comprising: (a) a processor; and, 
a computer program product of the invention. 
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All publications, patents, patent applications, GenBank sequences and ATCC 
deposits, cited herein are hereby expressly incorporated by reference for all purposes. 

The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below. Other features, objects, and advantages 
5 of the invention will be apparent from the description and drawings, and from the claims. 

DESCRIPTION OF DRAWINGS 

Figure 1 is an example of functional linkages predicted between InhA (Rv 
1484) and other TB genes. 

Figure 2 is an example of predicted functional linkages between embB (Rv 
10 3795), which is a target of the drug ethambutol, and other TB genes using the phylogenetic 
profile method. 

Figure 3 is an example of predicted functional linkages between five TB genes 
having homology to penicillin binding proteins and other TB genes. 

Figure shows that gcpE (Rv 2868C) is predicted to be functional linked to cell 
1 5 wall metabolism. 

Figure 5 shows predicted functional linkages of htrA (Rv 1223C) with other 

TB genes. 

Like reference symbols in the various drawings indicate like elements. 

20 DETAILED DESCRIPTION 

The present invention provides novel methods for identifying the relationships 
between and the function of nucleic acid and polypeptide sequences. The methods of the 
invention identify novel genes and polypeptides on the basis of their functional linkage to 
other proteins whose biological function or processes is known or inferred by homology. 
25 The genes and polypeptides identified by the methods of the invention can be 

used in screening methods for the identification of compositions which, by binding or 
otherwise interacting with the gene or polypeptide, are capable of modifying the physiology 
and growth of an organism. The compositions identified by these screening methods are 
useful as drugs and pharmaceuticals. Thus, genes and polypeptides identified by the methods 
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of the invention, including the genes and polypeptides identified herein, can be used as 
potential drug targets. 

One aspect of the invention provides methods for identifying the function of 
genes and polypeptides from Mycobacterium tuberculosis (MTB or TB). Based on this new 
5 functional determination, these genes and polypeptides can be used to screen for 
compositions capable of modifying the physiology and growth of Mycobacterium 
tuberculosis (TB). Thus, genes and polypeptides identified by the methods of the invention, 
including the genes and polypeptides identified herein, can be used as targets in screening 
protocols and can be useful as potential drug targets. 

10 The function of the TB genes and polypeptides of the present invention were 

identified using the methods of the invention; i.e., they were identified on the basis of their 
functional linkage to other proteins whose biological function or processes were known by 
experiment or inferred by homology. TB genes and polypeptides that are functionally linked 
to genes known to be involved in pathogenesis or organisms survival are potential drug 

15 targets. Genes or polypeptides associated with TB pathogenesis, survival or that are 

important or unique to TB biochemical pathways are potential drug targets. TB genes and 
polypeptides that have no homologues identified in humans are potential drug targets. The 
function of many of the TB genes and polypeptides identified is based on the genes or 
polypeptides with which they are functionally linked. 

20 TB genes whose function was identified using the methods of the invention 

are effectively targeted by a drug (i.e., they can act as bona fide drug targets) provides proof 
of principle that the invention's methods for identifying functionally linked genes can 
identify TB genes and polypeptides that are drug targets. Further confirmation that the genes 
identified by the methods of the invention include bona fide drug targets can be supported by 

25 the fact that genes already known to be targets for drugs have been independently identified, 
or "re-discovered," by the invention's methods. 

The novel TB genes described herein are identified as being functionally 
related or linked to other genes, including other TB genes, such as a known TB drug target 
(e.g., InhA polypeptide, which is a target of isoniazid). These functional linkages are 

30 established using mathematical algorithms. The assignment or inference of a function to TB 
genes and polypeptides based on their linkage or relatedness to other genes and polypeptides 
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is described in U.S. provisional application serial no. 60/165,086. Potential TB drug targets 
are identified by several methods discussed herein and in further detail in U.S. provisional 
application serial no. 60/134,092. Through the use of these methods, TB genes and 
polypeptides have been identified as potential drug targets and are illustrated on Tables 1 and 
5 2, and Figures 1 to 5. The nucleotide and amino acid sequences of these potential drug 
targets are illustrated on Tables 3 and 4, respectively (see below). 

The phrase "functional link," "functionally related" and grammatical 
variations thereof, when used in reference to genes or polypeptides, means that the genes or 
polypeptides are predicted to be linked or related. A particular example of functionally 

10 related or linked proteins is where two proteins participate in a biochemical or metabolic 
pathway (e.g., malate dehydrogenase and fumarase, which are both present in the TCA 
cycle). Thus, although functionally linked or related proteins may not have sequence 
homology to each other, they are linked by virtue of their participation in the same 
biochemical pathway. Other examples of linked or related polypeptides are where two 

15 polypeptides are part of a protein complex, physically interact, or act upon each another. 

The "domain fusion" or "Rosetta Stone" method searches protein sequences 
across all known genomes and identifies proteins that are separate in one organism but joined 
as intramolecular domains into one larger protein in another organism. Such proteins that are 
separate in some organisms but joined in others often carry out related or sequential functions 

20 and are therefore functionally linked. 

The phylogenetic profile method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
organisms. Proteins that have similar patterns of inheritance, either acquired or lost as a part 
of a group of proteins through evolution, are functionally linked. The gene proximity method 

25 identifies genes that remain physically close or "clustered" throughout evolution and are 
therefore functionally linked. 

A particular example of the identification of a potential TB drug target would 
be to identify a TB gene or polypeptide functionally linked to a known drug target. Anti-TB 
drugs include isoniazid, rifampicin, ethambutol, streptomycin, pyrazxinamide, and 

30 thiacetazone. For isoniazid, this drug is believed to act through enoyl-acyl reductase InhA, 
resulting in mycolic acid biosynthesis inhibition. Thus, TB genes or polypeptides 
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functionally linked to enoyl-acyl reductase InhA are potential drug targets; see Figure 1 , 
which shows an analysis of InhA, the target for isoniazid, the most widely used anti- 
tuberculosis drug, and functional linkages to a set of genes mostly known or hypothesized to 
be involved in cell wall-related processes and lipid and polyketide metabolism. Particular 
5 examples of the identification of several TB genes and polypeptides that are functionally 
related to the target of these anti-TB drugs is shown in Figures 1 to 5. 

"Domain Fusion" or "Rosetta Stone" Method 

The "domain fusion" or "Rosetta Stone" method compares protein sequences 
across known nucleic acid databases (e.g., known genomes) to identify genes and proteins 

10 that are separate entities in one organism but are joined into one larger multidomain protein 
in another organism. In such cases, the two separate proteins often carry out related or 
sequential functions or form part of a larger protein complex. Therefore, the general function 
of one component (e.g., one or more of the unknown proteins) can be inferred from the 
known function of the other component. In addition, merely identifying links between 

15 proteins using the method described herein provides valuable information (e.g., usefulness as 
a target for an antibacterial drug), regardless of whether the function of one or more of the 
proteins used to form the link(s) is known. Because the two components do not have similar 
amino acid sequence the function of one could not be inferred from the other on the basis of 
sequence similarity alone. 

20 The methods for identifying drug targets (e.g., TB drug targets) described 

herein (e.g., the "Rosetta Stone Method") are based on the idea that proteins that participate 
in a common structural complex, metabolic pathway, biological process or with closely 
related physiological functions, are functionally linked. In addition, these methods also are 
capable of identifying proteins that interact physically with one another. Functionally linked 

25 proteins in one organism can often be found fused into a single polypeptide chain in a 

different organism. Similarly, fused proteins in one organism can be found as individual 
proteins in other organisms. For example, in a first organism one might identify two un- 
linked proteins "A" and "B" with unknown function. In another organism, one may find a 
single protein "AB" with a part that resembles "A" and a part that resembles "B". Protein 

30 AB allows one to predict that "A" and "B" are functionally related. 
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The functional activity of each distinct protein in the "Rosetta Stone" method 
need not be known prior to performing the method the function of A, B, or AB need not 
be known). Using the "Rosetta Stone" method to compare and analyze several unknown 
protein sequences can provide information regarding relationships of each protein absent 

5 knowledge about the functional activity of the initially analyzed proteins themselves. For 

example, the information (i.e., the links) can provide information that the proteins are part of 
a common pathway, function in a related process or physically interact. Such information 
need not be based on the biological function of the individual proteins. 

These methods can provide information regarding links between previously 

10 un-linked proteins that function, for example, in a concerted process. A marker, for example, 
for a particular disease state is identified by the presence or absence of a protein (e.g., 
Her2/neu in breast cancer detection). Links (i.e., information) identified by the method, 
which link proteins "B" and "C" to such a marker suggest that proteins "B" and "C" are 
related by function, physical interaction or part of a common biological pathway with the 

15 marker. Such information is useful in designing screening methods and identifying drug 
targets (e.g., TB drug targets), making diagnostics, and designing therapeutics. 

In one approach, the "Rosetta Stone" method is performed by sequence 
comparison that searches for incomplete 'triangle relationships" between, for example, three 
proteins, Le., for two proteins A' and B' that are different from one another but similar in 

20 sequence to another protein AB. Completing the triangle relationship provides useful 

information regarding the proteins' biological function(s), functional interaction, pathway 
relationships or physical relationships with other proteins in the "triangle." 

Either nucleotide sequences or amino acid sequences can be used in the 
methods for identifying functionally related or linked genes or polypeptides. Where a 

25 nucleic sequence is to be used it can be first translated from a nucleic acid sequence to amino 
acid sequence. Such translation may be performed in all frames if the coding sequence is not 
known. Programs that can translate a nucleic acid sequence are known in the art. In 
addition, for simplicity, the description of this method discusses the use of a "pair" of 
proteins in the determination of a "Rosetta Stone" protein, more than 2 may be used (e.g. , 3, 

30 4, 5, 10, 100 or more proteins). Accordingly, one can analyze chains of linked proteins, such 
as "A" linked by a Rosetta Stone protein to "B" linked by a Rosetta Stone protein to "C", etc. 
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By this method, groups of functionally related proteins can be found and their function 
identified. 

A method can start with identifying the primary amino acid sequence for a 
plurality of proteins whose functional relationship is to be determined {e.g., protein A' and 

5 protein B'). A number of source databases are available, as described above, that contain 
either a nucleic acid sequence and/or a deduced amino acid sequence for use with the first 
step. The plurality of sequences (the "probe sequences") are then used to search a sequence 
database, e.g., GenBank (NCBI, NLM, NIH), PFAM (a large collection of multiple sequence 
alignments and hidden Markov models covering many common protein domains; 

10 Washington University, St. Louis MO) or ProDom (a database based on recursive PSI- 

BLAST searches and designed as a tool to help analyze domain arrangements of proteins and 
protein families, see, e.g., Corpet (1999) Nucleic Acids Res. 27:263-267), either 
simultaneously or individually. Every protein in the sequence database is examined for its 
ability to act as a "Rosetta Stone" protein (i.e., a single protein containing polypeptide 

15 sequences or domains from both protein A' and protein B'). A number of different methods 
of performing such sequence searches are known in the art. Such sequence alignment 
methods include, for example, BLAST (see, e.g., Altschul (1990) J. Mol. Biol. 215: 403- 
410), BLITZ (MPsrch) (see, e.g., Brenner (1995) Trends Genet. 1 1 :330-331; and infra), and 
FASTA (see, e.g., Pearson (1988) Proc. Natl. Acad. Sci. USA 85(8):2444-2448; and infra). 

20 The probe sequence can be any length (e.g. , about 50 amino acid residues to about 1 000 
amino acid residues). 

Probe sequences (e.g., polypeptide sequences or domains) found in a single 
protein (e.g., an "AB" multidomain protein) are defined as being "linked" by that protein. 
Where the probe sequences are used individually to search the sequence database, one can 

25 mask those segments having homology to the first probe sequence found in the proteins of 
the sequence database prior to searching with the subsequent probe sequence. In this way, 
one eliminates any potential overlapping sequences between the two or more probe 
sequences. 

The linked proteins can then be further compared for similarity with one 
30 another by amino acid sequence comparison. Where the sequences are identical or have high 
homology, such a finding can be indicative of the formation of homo-dimers, -trimers, etc. 
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Typically, "Rosetta Stone' -linked proteins are only kept when the linked proteins show no 
homology to one another {e.g., hetero-dimers, trimers, etc.). 

In another method for identifying functional linkages, a potential fusion 
protein lacking any functional information that is suspected of having two or more domains 
5 (e.g., a potential "Rosetta Stone" protein) may be used to search for related proteins. In this 
method, the primary amino acid of the fusion protein is determined and used as a probe 
sequence. This probe sequence is used to search a sequence database (e.g., GenBank, PFAM 
or ProDdm). Every protein in the sequence database is examined for homology to the 
potential fusion protein (i.e. 9 multiple proteins containing polypeptide sequences or domains 

10 from the potential fusion protein). A number of different methods of performing such 

sequence searches are known in the art, e.g., BLAST, BLITZ (Biocomputing Research Unit, 
University of Edinburgh, Scotland, the "MPsrch program" performs comparisons of protein 
sequences against the Swiss-Prot protein sequence database using the Smith and Waterman 
best local similarity algorithm), and FASTA. 

1 5 Probe sequences found in more than one protein (e.g. , A' and B' proteins) are 

defined as being "linked" so long as at least one protein per domain containing that domain 
but not the other is also identified. In other words, at least one protein or domain of the 
plurality of proteins must also be found alone in the sequence database. This verifies that the 
protein or domain is not an integral part of a first protein but rather a second independent 

20 protein having its own functional characteristics. 

Statistical methods can be used to judge the significance of possible matches. 
The statistical significance of an alignment score is described by the probability, P, of 
obtaining a higher score when the sequences are shuffled. One way to compute a P value 
threshold is to first consider the total number of sequence comparisons that are to be 

25 performed. For example, if there are N proteins in E. coli and M in all other genomes this 
number is N x M If a comparison of this number of random sequence would result in one 
pair to yield a P value of \INM by chance this then is set as the threshold. 

This method provides information regarding which proteins are functionally 
related (e.g., related biological functions common structural complexes, metabolic pathways 

30 or biological process) a subset of which physically interact in an organism. 
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Alignment Algorithms 

To align sequences, a number of different procedures can be used that produce 
a good match between the corresponding residues in the sequences. Typically, the Smith- 
Waterman (Smith (1981) Adv. Appl. Math. 2:482) or Needleman-Wunsch algorithm 
(Needleman (1970) J. Mol. Biol. 48:443) algorithm, are used, however, other, faster 
procedures such as BLAST, FASTA, PSI-BLAST (a version of Blast for finding protein 
families), or others known in the art (see infra discussion), can be used. 

Filtering Methods 

The Rosetta Stone Method provides at least two pieces of information. First 
the method provides information regarding which proteins are functionally related. Second 
the method provides information regarding which proteins are physically related. Each of 
these two pieces of information has different sources of error and prediction. The first type 
of error is introduced by protein sequences that occur in many different proteins and paired 
with many other protein sequences. The second type of error is introduced due to there often 
being multiple copies of similar proteins, called paralogs, in a single organism. In general, 
the "Rosetta Stone" method predicts functionally related proteins well, with no filtering of 
results required. However, it is possible to filter the error associated with either the first or 
second type of information. 

The invention recognizes that a few domains are linked to an excessive 
number of other domains by a "Rosetta Stone" protein. For example, 95% of the domains 
are linked to fewer than 25 other domains. However, some domains, e.g y the Src Homology 
3 (SH3) domain or ATP-binding cassette (ABC domains), link to more than a hundred other 
domains. These links were filtered by removing all links generated involving these 5% of 
domains (i.e., the domains linked to more than 25 other domains). For example, in E. coli, 
without filtering, 3531 links were identified using the domain-based analysis, but after 
filtering only 749 links were identified. This method improved prediction of functionally 
related proteins by 28% and physically related proteins by 47%. Accordingly, there are a 
number of ways to filter the results to improve the significance of the functional links. As 
described above, as the number of functional links increases there is an increased higher 
chance of finding a "Rosetta Stone" protein. By reducing the excessively linked proteins one 
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reduces the chance number of "Rosetta Stone" proteins thereby increasing the significance of 
a functional link. 

Error introduced by multiple paralogs of linked proteins should have little 
effect on functional prediction, as paralogs usually have very similar function, but will affect 
the reliability of prediction of protein-protein interactions. This estimate is calculated for 
each linked protein pair, and can be estimated roughly as: 

Fractional Error = 1 - , 
N 

where N is the number of paralogous protein pairs, (e.g., A linked to B, A' linked to 
B*, A linked to B\ and A' linked to B, in the case that A and A' are paralogs, as are B 
and B\ and the linking proteins is AB as above). 

The error can also be estimated as 1-7*, where T is the mean percent of 
potential true positives calculated for all domain pairs in an organism. For each domain pair 
linked by a Rosetta Stone protein, there are n proteins with the first domain but not the 
second, and m proteins with the second domain but not the first. The percent of true 
positives T is therefore estimated as the smaller of n or m divided by n times m. As this error 
T can be calculated for each set of linked domains, it can describe the confidence in any 
particular predicted interaction. 

In addition, the error in functional links can be caused by small conserved 
regions or repeated common amino acid sequences being repeatedly identified in a "Rosetta 
Stone** protein by a plurality of distinct non-homologous polypeptides. To reduce this error 
the percent of identity between the "Rosetta Stone'* and the distinct non-homologous 
polypeptide can be measured. Alignment percentages of about 50% to about 90%, or, 
alternatively, about 75%, between the "Rosetta Stone*' and the distinct polypeptide are 
indicative of links that are not subject to the small peptide sequence. 

Phylogenetic Pathway Method 

The "phylogenetic profile" method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
organisms. In its simplest form, each protein is simply characterized by its presence or 
absence in each organism. For example, if there are 16 known genomes, then each protein 
may be assigned a 16-bit code or phylogenetic profile. Since proteins that function together 
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(e.g., in the same metabolic pathway or as part of a larger functional or structural complex) 
evolve in a correlated fashion, they should have the same or similar patterns of inheritance, 
and therefore similar phylogenetic profiles. Therefore, the function of one protein may be 
inferred from the function of another protein, which has a similar profile, if its function is 
5 known. As with the Rosetta Stone method, the function of one protein is inferred from the 
function of another protein which is dissimilar in sequence. Furthermore, the predicted link 
between the proteins has utility in developing, for example, drug targets, diagnostics and 
therapeutics. 

The phylogenetic profile method can be implemented in a binary code (j.e., 

10 describing the presence or absence of a given protein in an organism) or a continuous code 
that describes how similar the related sequences are in the different genomes. In addition, 
grouping of similar protein profiles may be made wherein similar profiles are indicative of 
functionally related proteins. Furthermore, the requirements for similarity can be modified 
depending upon particular criteria by varying the difference in similar bit requirements. For 

15 example, criteria requiring that the degree of similarity in the profile include all 16 bits being 
identical can be set, but may be modified so that similarity in 15 bits of the 16 bits would 
indicate relatedness of the protein profiles as well. Statistical methods can be used to 
determine how similar two patterns must be in order to be related. 

The phylogenetic profile method is applicable to any genome including, e.g., 

20 viral, bacterial, archaeal or eukaryotic. The method of phylogenetic profile grouping 

provides the prediction of function for a previously uncharacterized protein(s). The method 
also allows prediction of new functional roles for characterized proteins based upon 
functional linkages. It also provides potential informative connections (i.e., links) between 
uncharacterized proteins. 

25 To represent the subset of organisms that contain a homolog a phylogenetic 

profile is constructed for each protein. The simplest manner to represent a protein's 
phylogenetic history is via a binary phylogenetic profile for each protein. This profile is a 
string with N entries, each one bit, where N corresponds to the number of genomes. The 
number of genomes can be any number of two or more (e.g., 2, 3, 4, 5, 10, 100, to 1000 or 

30 more). The presence of a homolog to a given protein in the genome is indicated with an 
entry of unity at the /I th position (e.g. , in a binary system an entry of 1). If no homolog is 
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found the entry is zero. Proteins are clustered according to the similarity of their 
phylogenetic profiles. Similar profiles show a correlated pattern of inheritance, and by 
implication, functional linkage. The method predicts that the functions of uncharacterized 
proteins are likely to be similar to characterized proteins within a cluster. 

In order to decide whether a genome contains a protein related to another 
particular protein, the query amino acid sequence is aligned with each of the proteins from 
the genome(s) in question using known alignment algorithm (see above). To determine the 
statistical significance of any alignment score, the probability,/?, of obtaining a higher score 
when the sequences are shuffled is described. One way to compute a p value threshold is to 
first consider the total number of sequence comparisons that are being aligned. If there are N 
proteins in a first organism's genome and M in all other genomes this number is N x M. If 
this number were compared to random sequences it would be expected that one pair would 

yield a p value of _J This value can be set as a threshold. Other thresholds may be used 

NM 

and will be recognized by those of skill in the art. 

A non-binary phylogenetic profile can be used. In this method, the 
phylogenetic profile is a string of AT entries where the n ih entry represents the evolutionary 
distance of the query protein to the homolog in the n th genome. To define an evolutionary 
distance between two sequences an alignment between two sequences is performed. Such 
alignments can be carried out by any number of algorithms known in the art (for examples, 
see those described above). The evolution is represented by a Markov process with 
substitution rates, over a fixed interval of time, given by a conditional probability matrix: 

p{aa —>aa*) 

where aa and aa ' are any amino acids. One way to construct such a matrix is to 
convert the BLOSUM62 amino acid substitutions matrix (or any other amino acid 
substitution matrix, e.g., PAM100, PAM250) from a log odds matrix to a conditional 
probability (or transition) matrix: 

BLOSUM62ij 
P&^>j)=pQ)2 j 0) 

P(i — > j) is the probability that amino acid i will be replaced by amino acid j through 
point mutations according to the BLOSUM62 scores. The p/s are the abundances of amino 
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acid j and are computed by solving the 20 linear equations given by the normalization 
conditions that: 

5X*->»=i (2) 

i 

5 The probability of this process is computed to account for the observed 

alignment by taking the product of the conditional probabilities for each aligned pair: 
P(p)=Y\p{aan-±aa\) . (3) 

A family of evolutionary models is then tested by taking powers of the 

10 conditional probability matrix: p -p a (aa-+aa y The power a that maximized P is defined to 

be the evolutionary distance. 

Many other schemes may be imagined to deduce the evolutionary distance 

between two sequences. For example, one might simply count the number of positions in the 

sequence where the two proteins have adapted different amino acids. 

15 Although the phylogenetic history of an organism can be presented as a vector 

(as described above), the phylogenetic profiles need not be vectors, but may be represented 

by matrices. This matrix includes all the pair wise distances between a group of homologous 

protein, each one from a different organism. Similarly, phylogenetic profiles could be 

represented as evolutionary trees of homologous proteins. Functional proteins could then be 

20 clustered or grouped by matching similar trees, rather than vectors or matrices. 

In order to predict function, different proteins are grouped or clustered 

according to the similarity of their phylogenetic profiles. Similar profiles indicate a 

correlated pattern of inheritance, and by implication, functional linkage. 

Grouping or clustering may be accomplished in many ways. The simplest is 

25 to compute the Euclidean distance between two profiles. Another method is to compute a 

correlation coefficient to quantify the similarity between two profiles. All profiles within a 

specified distance of the query profile are considered to be a cluster or group. 

Typically a genome database will be used as a source of sequence 

information. Where the genome database contains only the nucleic acid sequence that 

30 sequence is translated to an amino acid sequence in frame (if known) or in all frames if 

unknown. Direct comparison of the nucleic acid sequences of two or more organisms may 

be feasible but will likely be more difficult due to the degeneracy of the genetic code. 
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Programs capable of translating a nucleic acid sequence are known in the art or easily 
programmed by those of skill in the art to recognize a codon sequence for each amino acid. 

The phylogenetic profile provides an indication of those proteins in each of the 
at least two organisms that share some degree of homology. Such a comparison can be done 
5 by any number of alignment algorithms known in the art or easily developed by one skilled 
in the art (see, for example, those listed above, e.g., BLAST, FASTA etc.) In addition, 
thresholds can be set regarding a required degree of homology. Each protein is then grouped 
at 224 with related proteins that share a similar phylogenetic profile using grouping 
algorithms. 

10 "Functionally-, Structurally- or Metabolically- Linked" Method 

The "physiologic linkage*' method is a computational method that detects (i.e., 
identifies) proteins, and the genes that encode them, that participate in a common functional 
pathway (e.g., cell motility or cell division), that participate in the synthesis of the same or a 
similar structural complex (e.g., a cell wall) or participate in the same or similar metabolic 

15 pathway (e.g., glycolysis, lipid synthesis, and the like). Proteins within these common 
functional pathway groups are examples of "functionally linked" proteins. Having a 
common functional "goal" they evolve in a correlated fashion. Thus, "homologs" in 
different organisms can be comparatively identified. While these detection methods are very 
effective in identifying functional homologues in the same subset of organisms, functional 

20 linkages can be made between widely genetically disparate organisms. 

In one aspect, metabolic pathways are defined as links between proteins that 
operate in the same metabolic pathway that can be identified by sequence identity searching, 
e.g., by performing a BLAST search to find top-scoring polypeptides with high similarity 
(BLAST alignment E-value < 10* 20 ) to polypeptides identified in a known pathway. For 

25 example, M tuberculosis proteins were so analyzed against E. coli proteins; MTB proteins 
whose E. coli homologs (i.e., having high similarity by BLAST alignment) act adjacently in 
metabolic pathways as defined in the EcoCyc database (see, e.g., Karp (1998) Nucleic Acids 
Res. 26:50-53) were identified. 

In another example, flagellar proteins are found in bacteria that possess 

30 flagella but not in other organisms. Accordingly, if two proteins have homologs in the same 
subset of fully sequenced organisms, they are likely to be functionally linked. The methods 
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of the invention use this concept to systematically map links between all the proteins coded 
by a genome. 

Typically, functionally linked proteins have no amino acid sequence similarity 
with each other and, therefore, cannot be linked by conventional sequence alignment 
techniques. Accordingly, the methods of the invention identify drug targets that could not be 
identified using conventional sequence comparison (i.e., sequence homology or sequence 
identity) techniques. 

Prediction of functionally linked proteins by the "phylogenetic method" can 
also be used in conjunction with the "domain fusion" or "Rosetta Stone" method and also can 
be filtered by other methods that predict functionally linked proteins, such as the protein 
phylogenetic profile method or the analysis of correlated mRNA expression patterns. It was 
found that filtering by these two methods for the Rosetta Stone prediction for S. cerevisiae, 
that proteins predicted to be functionally linked by two or more of these three methods were 
as likely to be functionally related as proteins that were observed to physically interact by 
experimental techniques like yeast 2-hybrid methods or co-immunoprecipitation methods. 

For example, a combination of these methods of prediction can be used to 
establish links between proteins of closely related function. The methods of the invention 
the "Rosetta Stone" method and the "phylogenetic profile" method) can be combined 
with one another or with other protein prediction methods known in the art; see, for example, 
Eisen (1998) "Cluster analysis and display of genome-wide expression partners," Proc. Natl. 
Acad ScL USA, 95:14863-14868. 

The various techniques, methods, and variations thereof described can be 
implemented in part or in whole using computer-based systems and methods. Additionally, 
computer-based systems and methods can be used to augment or enhance the functionality 
described above, increase the speed at which the functions can be performed, and provide 
additional features and aspects as a part of or in addition to those of the invention described 
elsewhere in this document. Various computer-based systems, methods, and 
implementations in accordance with this technology are described herein. 

Proteins linked to current drug targets 

The invention also provides a novel method for identifying a polypeptide, or 
the nucleic acid sequence that encodes it, that is a target for a drug. The method analyzes the 
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functional relationship between at least two sequences, wherein at least one of the sequences 
is a known target of a drug or encodes a polypeptide drug target. The method comprises 
identifying proteins, and the genes that encode them, that are functionally linked to the 
targets of known drugs. The functional linkage is determined by using the "domain fusion" 
method, the "phylogenetic profile" method or the "physiologic linkage" method, or a 
combination thereof, as described herein. 

Thus, this aspect of the invention provides methods identifying drug targets 
from among all or a subset of genes in a genome using computationally-determined 
functional linkages. In one implementation of the method, functional linkages are calculated 
using the "domain fusion" method, the "phylogenetic profile" method or the "physiologic 
linkage" method, or a combination thereof, between all "query genome genes." Next, each 
set of genes predicted to be functionally linked to either a known drug target or to a sequence 
homolog or ortholog (defined below) to a known drug target are examined. These proteins 
(and the nucleic acids that encode them) are functionally linked to known drug targets; thus, 
they are operating in the same pathways or systems targeted by the known drug. 
Accordingly, the methods of the invention have identified them as drug targets. 

This method is particularly effective for identifying drug targets in pathogens, 
such as microorganisms, e.g., bacteria, viruses and the like. This method allows for the 
identification of novel drug targets that cannot be identified by other techniques, such as 
traditional sequence homology or sequence identity comparison techniques. Several known 
drug targets in M tuberculosis were used with the methods of the invention to use functional 
linkages to identify potential new drug targets in the same pathways as the known drug 
targets. 

There are very few drugs that are effective for anti-tuberculosis therapy, since 
the complex lipid-rich mycobacterial cell wall is impermeable to many antibacterial agents. 
Additionally, single- and multi-drug resistance is rapidly emerging against these drugs. To 
address this issue, the methods of the invention were used to identify Mycobacterium 
tuberculosis (MTB or TB) proteins that are functionally linked to the targets of known drugs. 
Inhibiting these proteins should have the same effect on the organism as the drug, since the 
same processes or pathways would be disrupted. Targeting multiple components of a given 
biochemical pathway would also diminish the opportunity for the development of resistance 
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because various related proteins would have to mutate against inhibitors while preserving the 
overall functionality of the pathway. 

A list of targets of essential anti-TB drugs (World Health Organization, 
Geneva, Switzerland) was compiled. The anti-TB drugs included isoniazid, rifampicin, 
5 ethambutol, streptomycin, pyrazinamide and thiacetazone. Although not enough is known 
about the molecular basis of action of the latter two, the functional linkages of the known 
drug targets was examined. 

Isoniazid This is one of the most widely used of all anti-tuberculosis drugs. 
It is believed that the compound is activated by the catalase-peroxidase KatG. Once 

10 activated, it then attaches to a nicotinamide adenine dinucleotide bound to the enoyl-acyl 
carrier protein reductase InhA, resulting in the inhibition of mycolic acid biosynthesis 
Rozwarski (1998) Science 279:98-102. 

Using the "phylogenetic profile, the inhA gene was "linked," or functionally 
associated with, to two polyketide synthases, pksl and pks6 (Figure 1), both of which contain 

15 acyl carrier protein motifs. The polyketide synthase pks6 is in turn known from established 
metabolic pathways to be linked to fatty acid biosynthesis gene accD3. Further, pks6 is 
linked to fadD28 and to the operon containing the genes ppsA-E, all recently reported to be 
crucial for bacterial replication in host lungs (see, e.g., Cox (1999) Nature 402:79-83). 

The inhA gene was also linked to an operon encoding two putative 

20 oxidoreductases and a gene of entirely unknown function. The inhA gene was further linked 
to a second operon that includes pepR and gpsl. PepR is a protease whose Bacillus subtilis 
homolog is adjacent to the genes coding for enzymes that synthesize diaminopimelate, a 
component of the cell wall incorporated by the murE gene product and diaminopicolinate 
(see, e.g., Chen (1993) J. Biol. Chem. 268:9448-9465). PepR is an ortholog of an essential 

25 yeast gene and is likely to be essential for MTB (see below). Gpsl is a putative 
multifunctional enzyme involved in guanosine pentaphosphate synthesis and 
polyribonucleotide nucleotidyltransfer. The high reliability of the predicted functional link 
between gpsl and pepR and the absence of eukaryotic homologs suggests that gpsl could be a 
promising target for drug design. 

30 Rifampicin. This compound, along with the related rifabutin and KRM- 1 648 

are believed to act by directly targeting the RNA polymerase P-subunit (rpoB) given that 
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96% of resistant isolates were found to have mutations of various types in a limited region of 
the rpoB gene (see, e.g., Yang (1998) J. Antimicrob. Chemother. 42:621-628). 

Using the methods of the invention, as expected, functional linkages were 
found to another RNA polymerase subunit, rpoC, as well as to various tRNA synthases and 
ribosomal proteins. However, no functional links to uncharacterized proteins were found. 

Ethambutol This drug is effective against tuberculosis when used in 
combination with isoniazid. It is believed that the drug interacts with the EmbB protein, a 
probable arabinosyl-transferase, inhibiting the biosynthesis of arabinan, a component of cell- 
envelope lipids. As with rifampicin, the evidence for this interaction is indirect, since 
mutations in the embB gene are responsible for ethambutol resistance (see, e.g., Lety (1997) 
Antimicrob. Agents Chemother. 41:2629-2633). 

The "gene proximity" method correctly clusters embB with embA (Rv3794). 
This cluster is linked to a set of mostly uncharacterized genes by the "phylogenetic profile" 
method; see Figure 2, which shows an analysis of EmbB, the target for the anti-tuberculosis 
drug Ethambutol, and shows functional linkages to genes mostly of unknown function but 
with some indications of localization at the bacterial membrane. 

Two of the uncharacterized genes, Rvl706c and Rvl800, belong to the 
abundant PE/PPE family of proteins hypothesized to be a source of antigenic variation with 
the potential ability to interfere with immune responses by inhibiting antigen processing (see, 
e.g., Cole (1998) Nature 393, 537-544). A third uncharacterized gene, Rvl967 belongs to the 
one of the four copies of the mce operon. This operon consists of eight genes coding for 
integral membrane proteins and proteins that have N-terminal signal sequences or 
hydrophobic segments and are believed to be involved in pathogenicity (see, e.g., Cole 
(1998) supra). Rv0528 codes for a hypothetical membrane protein and Rv2159c corresponds 
to the murF gene, which participates in the biosynthesis of peptidoglycan precursors. 

The majority of the "links," or functionally associated sequences, involved 
proteins associated with processes related to the bacterial cell wall (with the possible 
exception of atsA and the putative choline dehydrogenase Rvl279, whose relationship to 
these processes is not immediately obvious). The proteins of unknown function are therefore 
also expected to play some role in these processes and are thus of interest as potential drug 
targets. 
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Streptomycin. This drug acts by binding to the 1 6S rRNA and inhibits protein 
synthesis. Resistance to this compound emerges from mutations in the corresponding gene 
(its), as well as in the gene encoding for the ribosomal protein S12 (rpsL). Disruptions to 
RpsL effect streptomycin resistance by altering the higher order structure of 16S rRNA (see, 
e.g., Sreevatsan (1996) Antimicrob. Agents Chemother. 40:1024-1026). 

Although streptomycin doesn't directly target RpsL, the functional links 
generated for this protein was examined, as any target whose inhibition will ultimately 
disrupt bacterial protein synthesis is likely to be an effective antigrowth/ anti-microbial 
target. As with the rifampicin target, the only functional linkages found for this protein were 
the expected protein synthesis-related proteins, including large ribosomal subunit proteins 
L2, L5, LI 1, and L14; small ribosomal subunit proteins S4, S5, S7, S8, and SI 1; elongation 
factors fusA and Ef-Tu; the chaperones GroEL, clpB and ftsH; and the Clp protease subunits 
clpC and clpX. 

Proteins linked to cell-wall related proteins 

The invention also provides a novel method for identifying a nucleic acid or a 
polypeptide sequence in an organism that is linked to a cell-wall related protein. The method 
analyzes the functional relationship between at least two sequences, wherein at least one of 
the sequences is a cell-wall related protein, or, the sequence is a nucleic acid sequence that 
encodes a cell-wall related protein. The method comprises identifying proteins, and the 
genes that encode them, that are functionally linked to a cell-wall related protein. The 
functional linkage is determined by using the "domain fusion" method, the "phylogenetic 
profile" method or the "physiologic linkage" method, or a combination thereof, as described 
herein. 

Approximately eleven M tuberculosis proteins are indicated by sequence 
homology to be penicillin-binding proteins, thought to synthesize peptidoglycan in the course 
of cell elongation and cell wall metabolism (see, e.g., Broome-Smith (1985) Eur. J. Biochem. 
147:437-446). Using the methods of the invention, the functional linkages found for these 
proteins map out many of the known cell wall synthetic enzymes and reveal more than 10 
proteins of unknown function that may also participate in cell wall metabolism. Figure 3 
shows an analysis of five of the approximately eleven MTB proteins presumed to bind 
penicillin to reveal functional linkages to various potential operons consisting of genes 
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involved in various aspects of cell wall metabolism, including cell shape determination and 
peptidoglycan biosynthesis, as well more than ten genes of unknown function, which we can 
now associate with cell wall metabolism. 

Three of the proteins (pbpA, pbpB, and ponAl) reside in conserved gene 
clusters, presumably operons. Other genes in the clusters around pbpA and pbpB are also 
implicated in cell wall metabolism. For example, pbpA resides next to rodA, a membrane- 
associated protein whose E. coli homolog determines cell shape and is required for enzymatic 
activity of penicillin binding proteins (see, e.g., Matsuzawa (1989) J. Bacteriol. 171 :558- 
560). Likewise, pbpB resides next to six peptidoglycan biosynthesis genes and the two 
septum and cell wall formation proteins ftsW and ftsZ. 

Two additional gene clusters were linked to these penicillin binding proteins 
by either the "phylogenetic profile" or "Rosetta Stone" pattern methods of the invention. 
One cluster is composed of the peptidoglycan synthetic protein murB and a putative 
membrane protein of unknown function that the functional linkages suggest is involved in 
cell wall metabolism. The second gene cluster contains four genes, three of which are 
predicted to reside in the cell membrane or envelope. Therefore, the uncharacterized genes 
in these clusters are likely to be involved in cell wall metabolism, closely related to the 
function of the penicillin binding proteins and are therefore promising drug targets. 

Another gene linked to cell wall metabolism by the computationally-derived 
linkage methods of the invention is gcpE, see Figure 4, which shows that the uncharacterized 
gene gcpE, known to be essential for bacterial survival (see, e.g., Baker (1992) FEMS 
Microbiol. Lett. 73:175-180), is predicted to be involved in cell wall metabolism through its 
functional links to a putative membrane protein and two murein hydrolase genes, lytBl and 
lytB2, involved in cell separation. The genes forming a putative operon with gcpE are 
proposed as potential drug targets. The functional linkages place gcpE in a conserved gene 
cluster with two genes of unknown function, one of which encodes a membrane protein. 
However, the three genes show correlated inheritance with two homologs of IytB, an E. coli 
gene involved in penicillin tolerance (see, e.g, Gustafson (1993) J. Bacteriol. 175:1203-1205) 
and recently shown to encode a murein hydrolase essential for cell separation (see, e.g., 
Garcia (1999) Mol. Microbiol. 31:1275-1277). The uncharacterized proteins from this 
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cluster are therefore expected to participate in processes similar to GcpE and might therefore 
be promising drug targets. 

Proteins linked to potentially novel pathways 

The invention also provides a novel method for identifying a polypeptide, or a 
nucleic acid that encodes it, that is linked to potentially novel biochemical (e.g., biosynthetic, 
metabolic) pathways. The method analyzes the functional relationship between at least two 
sequences, wherein at least one of the sequences is associated with a biochemical pathway, 
such as a pathway in a microorganism that enables the pathogen to evade an immune process. 
The method comprises identifying proteins, and the genes that encode them, that are 
functionally linked to the pathway-linked sequences. The functional linkage is determined 
by using the "domain fusion" method, the "phylogenetic profile" method or the "physiologic 
linkage" method, or a combination thereof, as described herein. 

For example, the htrA gene encodes for a putative heat shock protein 
homologous to HtrA from Salmonella typhimurium, a serine protease that degrades aberrant 
periplasmic proteins. Mutations in this protein have been linked with reduced viability in 
host macrophages (see, e.g., Johnson (1991) Mol. Microbiol. 5:401-407). Thus, it was 
decided to investigate the function of htrA. Using the methods of the invention, results 
indicated that the htrA protein is part of a process that has not yet been characterized. The 
gene is predicted with very high reliability to function with the uncharacterized gene 
Rvl224c, see Figure 5, which shows the involvement of htrA in a potentially novel pathway 
and the gene encoding the putative heat shock protein HtrA is functionally linked to a set of 
genes mostly of unknown function, suggesting the existence of a novel pathway. The 
partially characterized proteins suggest that the pathway relates to membrane-associated 
processes such as signaling and/or transport The lack of eukaryotic homologs for most of 
the genes linked to htrA, suggests that proteins of this pathway could be promising drug 
targets. 

Through its phylogenetic profile, htrA is linked to a group of uncharacterized 
proteins, including a putative lipid esterase (Rv 1900c), an ABC transporter (Rv3783) and the 
uncharacterized protein Rvl 216c, which has weak homology to the laminin B receptor of 
Xenopus laevis, suggesting that it might be a membrane protein. From this analysis, it can be 
concluded that htrA is part of a novel pathway that involves membrane-associated processes, 
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such as signaling and/or transport. Because the majority of the proteins linked to htrA have 
no eukaryotic homologs, and given the importance of htrA in 5. typhimurium pathogenesis, 
this pathway represents another potential source of novel targets for anti-tuberculosis drugs. 

Proteins linked to essential proteins 

The invention also provides a novel method for identifying a polypeptide, or 
the nucleic acid sequence that encodes it, that is linked to an essential protein (e.g., a protein 
necessary for the growth of an organism, such as a bacterium). The method analyzes the 
functional relationship between at least two sequences, wherein at least one of the sequences 
is linked to an essential protein, or, the sequence is a nucleic acid sequence that itself is 
essential or encodes a polypeptide linked to an essential protein. The functional linkage is 
determined by using the "domain fusion" method, the "phylogenetic profile" method or the 
"physiologic linkage" method, or a combination thereof, as described herein. 

For example, the MIPS database (Munich Information Center for Protein 
Sequences; MIPS provides access through its WWW server to a spectrum of generic 
databases, including PEDANT, MYGD, MAID, MEST, the PIR-Intemational Protein 
Sequence Database, the protein family database PROTFAM, the MITOP database, and the 
all-against-all FASTA database; see, e.g., Mewes (1999) Nucleic Acids Res. 27:44-48) 
contains a list of 734 genes that are essential for Saccharomyces cerevisiae viability (see, 
e.g., Mewes (1 999) supra). A list of Mycobacterium tuberculosis genes orthologous to these 
essential genes was generated. Using the methods of the invention, 60 such genes were 
found. The products of these genes have a high likelihood of also being essential to the 
tuberculosis bacterium and therefore could be promising therapeutic targets. Furthermore, 
since the list of essential genes came from a eukaryote, there is a significant chance that these 
genes would also be found in the human genome. 

Automatic Method to Identify Drug Targets from Functional Linkages 

One aspect of the invention provides a computational method to identify 
potential drug targets among the proteins expressed by a genome. This aspect takes 
advantage of the functional linkages calculated between genes in a genome using the 
methods described herein, as well as the detection of sequence homology and the knowledge 
of a set of lethal or "essential" genes in one or more organisms. 
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To identify drug targets in a query genome, the sequence homology between 
all of the genes in that genome and all of the genes in the genome of an organism for which 
essential genes are known is calculated. For example, as discussed herein, the query genome 
is Mycobacterium tuberculosis (TB) and the genome with known essentials is the yeast S. 
5 cerevisiae. Sequence homology between all TB genes and all yeast genes was calculated 
using the methods of the invention. 

"Equivalent" or "orthologous" genes were also identified by another aspect of 
the invention that comprises doing a reverse sequence search (e.g., yeast vs. TB) and then 
choosing pairs of genes that are the symmetric best-scoring sequence search. In one 

10 exemplary aspect, MTB orthologs of Saccharomyces cerevisiae genes were generated by 
finding all pairs of genes (TBi,SCj) where TBj was the top hit from a BLAST search of the 
yeast gene SCj against the MTB genome, SQ was the top hit from a BLAST search of the 
MTB gene TBj against the Saccharomyces cerevisiae genome and both top hits had a 
BLAST E-value <- lxlO" 5 . 

15 For example, a TB gene is an ortholog of a yeast gene if the yeast gene is the 

best scoring sequence match when yeast is searched with the TB gene, and the TB gene is the 
best scoring sequence match when TB is searched with the yeast gene. We define these 
"symmetric" pairs as "orthologs." 

After identifying orthologs between the query genome and the genome with 

20 known essential genes, a set of query genome genes that are orthologs of known essential 
genes in the other genome was chosen. These genes were designated the set of "putative 
essentials". For the purposes of the algorithm of the invention, these query genome genes are 
assumed to be essential genes, since they are the equivalents of essential genes in another 
genome. These genes act as "markers" or indicators of essential pathways in the query 

25 genome. One could supplement this set with genes already known to be essential in the 

query organism. Functional linkages (determined by the methods of the invention) between 
all query genome genes were examined. The query genome genes linked to all of the 
putative essential genes were examined. This set of genes was designated as the "predicted 
members of essential pathways." These genes are likely to be involved in important 

30 pathways, since the (predicted) pathways have members that are putative essentials. Lastly, 
the method removes from the set of genes in predicted essential pathways all of those genes 
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that have sequence homology to eukaryotic genes or proteins. The genes that remain after 
this filtering step are the predicted drug targets for the query organism. 

As a benchmark, this method was applied to the M. tuberculosis genome. Of 
the over 3900 genes in TB, 1 1 were identified as potential drug targets. Comparing this list 
of 1 1 predicted targets to the less than 10 known drug anti-IB drug targets, one gene was a 
known drug target and one was linked to a known drug target. Accordingly, the algorithm of 
the invention performed statistically significantly much better than a random choice of genes. 
A rough estimate of statistical significance suggests that one would expect to see 2 of 1 0 
known drug targets in a sample of 1 1 out of 3900 genes only 3.8 times out of 10,000 trials 
(probability of occurring by random chance of 3.8 x 10" 4 ). Therefore, this embodiment of the 
method is an entirely computational algorithm drawing on the demonstrated ability of the 
general methods of the invention to predict functional linkages between genes and to 
effectively identify drug targets in bacteria. The effectiveness of this method to identify 
novel drug targets was clearly demonstrated when the algorithm was applied to the M. 
tuberculosis genome. 

The specific inhibition of the MTB homologs might be difficult To address 
this issue, using the methods of the invention, functional links to the essential genes were 
searched. Functional links were selected which either do not have homologs in yeast, or the 
enzymatic activity of their products are known to be absent in human cells. Using the 
highest confidence data, functional links for 23 of the genes (indicated in bold in Table 1) 
were found. 
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Eight of these were linked to 12 unique MTB genes that satisfied the criteria 
of the invention's methods (Table 1). Exemplary findings include: 

(1) the gene folP, which encodes the enzyme dihydropteroate synthase 
(DHPS) known to be the target of sulfonamide antibacterial drugs. Although it is found in 
some eukaiyotes, DHPS activity is not found in human cells (see, e.g., Huovinen (1995) 
Antimicrob. Agents Chemother. 39:279-2890. 

(2) the product of the gene folK, a 7,8-dihydro-6-hydroxymethyl- 
pterinpyrophosphokinase, has recently been proposed as a target for broad-spectrum 
antibacterial drugs (see, e.g., Stammers (1999) FEBS Lett. 456:49-53). 

(3) the gene gpsl, is not only strongly linked to the essential yeast gene pepR, 
but it is also functionally linked to inhA, the target of the drug isoniazid (see above), making 
it a very compelling candidate for drug design. 
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Table 2. Subset of genes from Table 1 that are functionally linked to genes without 
yeast homologs. 



Gene 


Link 1 


Comments 


RvOOOS 


Rv0002 
Rv0003 

Rv0OO6 


dnaN DNA polymerase III, (3-subunit 
recF DNA replication and SOS induction 
gyr A DNA gyrase subunit A 


Rv0350 


Rv0351 

Rv0352 


grpE stimulates DnaK ATPase activity 

dna J acts with GrpE to stimulate DnaK ATPase 


RvlOlO 


RvlO08 
Rvl009 
RvlOll 


Similar to E.coli hypothetical protein YcfH 

Possible lipoprotein, similar to various other MTB proteins 

Similar to E.coli hypothetical protein YcbH 


Rv2439c 


Rv2427c 
Rv2440c 
Rv2441c 
Rv2442c 


proA y-glutamyl phosphate reductase 
obg Obg GTP-binding protein 
rpmA SOS ribosomal protein L27 
rplU 50S ribosomal protein L2 1 


Rv2782c 


Rv2783c 


gps I pppGpp synthase and polyribonucleotide phosphorylase 


Rv3598c 


Rv3600c 
Rv3606c 
Rv3607c 
Rv360fle* 

Rv3610c 


similar to Bacillus subtilis hypothetical protein YacB 
f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f oix may be involved in folate biosynthesis 

f ol P dihvdronternatp wnthfl^** ^TiHP^^ 

i UUljrUl UUlvl UAIC jY 11 UlflJC 

f tsH inner membrane protein, chaperone 


Rv3608c 


Rv3598c 
Rv3600c 
Rv3606c 
Rv3607c 

Rv3609c 
Rv3610c 


1 y s S lysy 1-tRNA synthase 

similar to Bacillus subtilis hypothetical protein YacB 
f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
f olE GTP cyclohydrolase I 
f t s H inner membrane protein, chaperone 


Rv3609c 


Rv3606c 
Rv3607c 
Rv3608c* 


f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
folP dihydropteroate synthase (DHPS) 



Genes without yeast homologs shown in boldface 

DHPS activity is found in some eukaryotic cells but not in human cells 



In summary, the methods of the invention allowed identification of this 
5 combination of functional linkages to essential genes. This information, together with the 
lack of eukaryotic homologs for these genes, makes this group of proteins promising drug 
targets, particularly because their inhibition is expected to disrupt vital bacterial processes 
with a low likelihood of toxicity from the inhibition of a host equivalent. 
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Computer Implementation 

The various techniques, methods, and aspects of the invention described 
herein can be implemented in part or in whole using computer-based systems and methods. 
Additionally, computer-based systems and methods can be used to augment or enhance the 
5 functionalities and algorithms described herein, increase the speed at which the functions can 
be performed, and provide additional features and aspects as a part of or in addition to those 
of the invention described elsewhere in this document. Various exemplary computer-based 
systems, methods and implementations in accordance with the above-described technology 
are presented herein. 

10 The processor-based system can include a main memory, such as a random 

access memory (RAM), and can also include a secondary memory. The secondary memory 
can include, for example, a hard disk drive and/or a removable storage drive, representing a 
floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage 
drive reads from and/or writes to a removable storage medium. Removable storage media 

15 can be a floppy disk magnetic tape, an optical disk, and the like, which can be read by and 
written to by removable storage drive. The removable storage media can includes a 
computer usable storage medium having stored therein computer software and/or data. 

In alternative embodiments, secondaiy memory may include other similar 
means for allowing computer programs or other instructions to be loaded into a computer 

20 system. Such means can include, for example, a removable storage unit and an interface. 

Examples of such can include a program cartridge and cartridge interface (such as the found 
in video game devices), a movable memory chip (such as an EPROM, or PROM) and 
associated socket, and other removable storage units and interfaces that allow software and 
data to be transferred from the removable storage unit to the computer system. 

25 The computer system can also include a communications interface. 

Communications interfaces allow software and data to be transferred between computer 
system and external devices. Examples of communications interfaces include modems, 
network interfaces (such as, for example, an Ethernet card), communications ports, PCMCIA 
slots and cards, and the like. Software and data transferred via a communications interface 

30 can be in the form of signals that can be electronic, electromagnetic, optical or other signals 
capable of being received by a communications interface. These signals can be provided to 
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communications interface via a channel capable of carrying signals and can be implemented 
using a wireless medium, wire or cable, fiber optics or other communications medium. Some 
examples of a channel can include a phone line, a cellular phone link, an RF link, a network 
interface, and other communications channels. 
5 As used herein, the terms "computer program medium" and "computer usable 

medium" are used to generally refer to media such as a removable storage device, a disk 
capable of installation in a disk drive, and signals on a channel, or equivalents thereof. These 
computer program products are means for providing software or program instructions to 
computer systems. Computer programs (also called computer control logic) can be stored in 

10 main memory and/or secondary memory. Computer programs can also be received via a 
communications interface. Such computer programs, when executed, enable the computer 
system to perform the features of the present invention as discussed herein. Computer 
programs, when executed, enable the processor to perform the features of the present 
invention. Accordingly, in one aspect of the invention, such computer programs represent 

15 controllers of the computer system. 

In another aspect of the invention the methods and algorithms are 
implemented using software, the software may be stored in, or transmitted via, a computer 
program product and loaded into a computer system using a removable storage drive, hard 
drive or communications interface. The control logic (software), when executed by the 

20 processor, causes the processor to perform the functions of the invention as described herein. 

In another aspect, the elements are implemented primarily in hardware using, 
for example, hardware components such as PALs, application specific integrated circuits 
(ASICs) or other hardware components. Implementation of a hardware state machine so as 
to perform the functions described herein will be apparent to person skilled in the relevant 

25 art(s). In yet another embodiment, elements are implanted using a combination of both 
hardware and software. 

In another aspect, the computer-based methods can be accessed or 
implemented over the World Wide Web by providing access via a Web Page to the methods 
of the present invention. Accordingly, the Web Page is identified by a Universal Resource 

30 Locator (URL). The URL denotes both the server machine, and the particular file or page on 
that machine. In this embodiment, it is envisioned that a consumer or client computer system 
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interacts with a browser to select a particular URL, which in turn causes the browser to send 
a request for that URL or page to the server identified in the URL. Typically the server 
responds to the request by retrieving the requested page, and transmitting the data for that 
page back to the requesting client computer system (the client/server interaction is typically 
performed in accordance with the hypertext transport protocol ("HTTP")). The selected page 
is then displayed to the user on the client's display screen. The client may then cause the 
server containing a computer program of the present invention to launch an application 
comprising a method of the invention, for example, to identify a nucleic acid or a polypeptide 
sequence that may be a target for a drug comprising the steps of (a) providing a first nucleic 
acid or a polypeptide sequence that is known to be a drug target; (b) providing an algorithm 
capable analyzing a functional relationship between nucleic acid or polypeptide sequences 
selected from the group consisting of a "domain fusion" method, a "phylogenetic profile" 
method and a "physiologic linkage" method; and, (c) comparing the first nucleic acid or the 
polypeptide drug target sequence to a plurality of sequences using at least one algorithm to 
identify a second sequence that has a functional relationship to the first sequence, thereby 
identifying a nucleic acid or a polypeptide sequence that may be a target for a drug, based on 
a query sequence provided by the client. 

Nucleic Acids and Polypeptides 

The invention also provides isolated nucleic acids and polypeptides 
comprising the sequences as set forth in Table 3 and Table 4 (below). As used herein, 
"isolated," when referring to a molecule or composition, such as, e.g., an isolated infected 
cell comprising a nucleic acid sequence derived from a library of the invention, means that 
the molecule or composition (including, e.g., a cell) is separated from at least one other 
compound, such as a protein, DNA, RNA, or other contaminants with which it is associated 
in vivo or in its naturally occurring state. Thus, a nucleic acid or polypeptide or peptide 
sequence is considered isolated when it has been isolated from any other component with 
which it is naturally associated. An isolated composition can, however, also be substantially 
pure. An isolated composition can be in a homogeneous state. It can be in a dry or an 
aqueous solution. Purity and homogeneity can be determined, e.g., using any analytical 
chemistry technique, as described herein. 
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The term "nucleic acid" or "nucleic acid sequence" refers to a deoxy- 
ribonucleotide or ribonucleotide oligonucleotide, including single- or double-stranded, or 
coding or non-coding (e.g., "antisense") forms. The terra encompasses nucleic acids, i.e., 
oligonucleotides, containing known analogues of natural nucleotides. The term also 
encompasses nucleic-acid-like structures with synthetic backbones, see e.g., Oligonucleotides 
and Analogues, a Practical Approach, ed. F. Eckstein, Oxford Univ. Press (1991); Antisense 
Strategies, Annals of the N. Y. Academy of Sciences, Vol 600, Eds. Baserga et al. (NYAS 
1992); Milligan (1993) J. Med. Chem. 36:1923-1937; Antisense Research and Applications 
(1993, CRC Press), WO 97/0321 1; WO 96/39154; Mata (1997) Toxicol. Appl. Pharmacol. 
144: 189-197; Strauss-Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996) Antisense 
Nucleic Acid Drug Dev 6:153-156. As used herein, the "sequence" of a nucleic acid or gene 
refers to the order of nucleotides in the polynucleotide, including either or both strands (sense 
and antisense) of a double-stranded DNA molecule, eg., the sequence of both the coding 
strand and its complement, or of a single-stranded nucleic acid molecule (sense or antisense). 
For example, in alternative embodiments, promoters drive the transcription of sense and/or 
antisense polynucleotide sequences of the invention, as exemplified by Table 3. 

The terms "polypeptide," "protein," and "peptide" include compositions of the 
invention that also include "analogs," or "conservative variants" and "mimetics" 
("peptidomimetics") with structures and activity that substantially correspond to the 
exemplary sequences, such as the sequences in Table 4. Thus, the terms "conservative 
variant" or "analog" or "mimetic" also refer to a polypeptide or peptide which has a modified 
amino acid sequence, such that the change(s) do not substantially alter the polypeptide's (the 
conservative variant's) structure and/or activity (e.g., immunogenicity, ability to bind to 
human antibodies, etc.), as defined herein. These include conservatively modified variations 
of an amino acid sequence, i.e., amino acid substitutions, additions or deletions of those 
residues that are not critical for protein activity, or substitution of amino acids with residues 
having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non- 
polar, etc.) such that the substitutions of even critical amino acids does not substantially alter 
structure and/or activity. Conservative substitution tables providing functionally similar 
amino acids are well known in the art. For example, one exemplary guideline to select 
conservative substitutions includes (original residue followed by exemplary substitution): 
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ala/gly or ser; arg/ lys; asn/ gin or his; asp/glu; cys/ser; gln/asn; gly/asp; gly/ala or pro; 
his/asn or gin; ile/leu or val; leu/ile or val; lys/arg or gin or glu; met/leu or tyr or ile; phe/met 
or leu or tyr, ser/thr; thr/ser; trp/tyr; tyr/trp or phe; val/ile or leu. An alternative exemplary 
guideline uses the following six groups, each containing amino acids that are conservative 
5 substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), 
Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) 
Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine 
(Y), Tryptophan (W); (see also, e.g., Creighton (1984) Proteins, W.H. Freeman and 
Company; Schulz and Schimer (1979) Principles of Protein Structure, Springer- Verlag). One 

10 of skill in the art will appreciate that the above-identified substitutions are not the only 
possible conservative substitutions. For example, for some purposes, one may regard all 
charged amino acids as conservative substitutions for each other whether they are positive or 
negative. In addition, individual substitutions, deletions or additions that alter, add or delete 
a single amino acid or a small percentage of amino acids in an encoded sequence can also be 

15 considered "conservatively modified variations." 

The terms "mimetic" and "peptidomimetic" refer to a synthetic chemical 
compound that has substantially the same structural and/or functional characteristics of the 
polypeptides of the invention (e.g., ability to bind, or "capture," human antibodies in an 
ELISA). The mimetic can be either entirely composed of synthetic, non-natural analogues of 

20 amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non- 
natural analogs of amino acids. The mimetic can also incorporate any amount of natural 
amino acid conservative substitutions as long as such substitutions also do not substantially 
alter the mimetics' structure and/or activity. As with polypeptides of the invention which are 
conservative variants, routine experimentation will determine whether a mimetic is within the 

25 scope of the invention, i.e., that its structure and/or function is not substantially altered. 
Polypeptide mimetic compositions can contain any combination of non-natural structural 
components, which are typically from three structural groups: a) residue linkage groups other 
than the natural amide bond ("peptide bond") linkages; b) non-natural residues in place of 
naturally occurring amino acid residues; or c) residues which induce secondary structural 

30 mimicry, i.e., to induce or stabilize a secondary structure, e.g., a beta turn, gamma turn, beta 
sheet, alpha helix conformation, and the like. A polypeptide can be characterized as a 
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mimetic when all or some of its residues are joined by chemical means other than natural 
peptide bonds. Individual peptidomimetic residues can be joined by peptide bonds, other 
chemical bonds or coupling means, such as, e.g., glutaraldehyde, N-hydroxysuccinimide 
esters, bifunctional maleimides, N,N'-dicyclohexylcarbodiimide (DCC) or N,N'- 
5 diisopropylcarbodiimide (DIC). Linking groups that can be an alternative to the traditional 
amide bond ("peptide bond") linkages include, e.g., ketomethylene (e.g., -C(=0)-CH 2 - for - 
C(=0)-NH-), aminomethylene (CH 2 -NH), ethylene, olefin (CH=CH), ether (CH 2 -0), 
thioether (CH 2 -S), tetrazole (CN 4 -), thiazole, retroamide, thioamide, or ester (see, e.g., 
Spatola (1983) in Ghemistry and Biochemistry of Amino Acids, Peptides and Proteins, Vol. 

10 7, pp 267-357, "Peptide Backbone Modifications/' Marcell Dekker, NY). A polypeptide can 
also be characterized as a mimetic by containing all or some non-natural residues in place of 
naturally occurring amino acid residues; non-natural residues are well described in the 
scientific and patent literature. 

The invention comprises nucleic acids comprising sequences as set forth in 

15 Table 3, or comprising nucleic acids encoding the polypeptides as set forth in Table 4, 

operably linked to a transcriptional regulatory sequence. As used herein, the term "operably 
linked," refers to a functional relationship between two or more nucleic acid (e.g., DNA) 
segments. Typically, it refers to the functional relationship of a transcriptional regulatory 
sequence to a transcribed sequence. For example, a promoter (defined below) is operably 

20 linked to a coding sequence, such as a nucleic acid of the invention, if it stimulates or 
modulates the transcription of the coding sequence in an appropriate host cell or other 
expression system. Generally, promoter transcriptional regulatory sequences that are 
operably linked to a transcribed sequence are physically contiguous to the transcribed 
sequence, i.e., they are m-acting. However, some transcriptional regulatory sequences, such 

25 as enhancers, need not be physically contiguous or located in close proximity to the coding 

sequences whose transcription they enhance. For example, in one embodiment, a promoter is 
operably linked to an ORF-containing nucleic acid sequence of the invention, as exemplified 
by, e.g., a nucleic acid sequence as set forth in Table 3. 

As used herein, the term "promoter" includes all sequences capable of driving 

30 transcription of a coding sequence in an expression system. Thus, promoters used in the 

constructs of the invention include c/s-acting transcriptional control elements and regulatory 
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sequences that are involved in regulating or modulating the timing and/or rate of 
transcription of a nucleic acid of the invention. For example, a promoter can be a cw-acting 
transcriptional control element, including an enhancer, a promoter, a transcription terminator, 
an origin of replication, a chromosomal integration sequence, 5' and 3' untranslated regions, 
5 or an intronic sequence, which are involved in transcriptional regulation. These exacting 
sequences typically interact with proteins or other biomolecules to carry out (turn on/off, 
regulate, modulate, etc.) transcription. 

The invention comprises expression cassettes comprising nucleic acids 
comprising sequences as set forth in Table 3, or comprising nucleic acids encoding the 

10 polypeptides as set forth in Table 4. The term "expression vector" refers to any recombinant 
expression system for the purpose of expressing a nucleic acid sequence of the invention in 
vitro or in v/vo, constitutively or inducibly, in any cell, including prokaryotic, yeast, fungal, 
plant, insect or mammalian cell. The term includes linear or circular expression systems. 
The term includes expression systems that remain episomal or integrate into the host cell 

15 genome. The expression systems can have the ability to self-replicate or not, i.e., drive only 
transient expression in a cell. The term includes recombinant "expression cassettes" which 
contain only the minimum elements needed for transcription of the recombinant nucleic acid. 

Alignment Analysis of Sequences 

The nucleic acid and polypeptide sequences of the invention include genes 

20 and gene products identified and characterized by sequence identify analysis (i.e., by 
homology) using the exemplary nucleic acid and protein sequences of the invention, 
including, e.g., those set forth in Tables 3 and 4. In alternative aspects of the invention, 
nucleic acids and polypeptides within the scope of the invention include those having 98%, 
95%, 90%, 85% or 80% sequence identity (homology) to the exemplary sequences as set 

25 forth in Tables 3 and 4. 

For sequence comparison, typically one sequence acts as a reference 
sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence coordinates 
are designated, if necessary, and sequence algorithm program parameters are designated. 

30 Default program parameters are used unless alternative parameters are designated herein. 

The sequence comparison algorithm then calculates the percent sequence identity for the test 
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sequences) relative to the reference sequence, based on the designated or default program 
parameters. A "comparison window**, as used herein, includes reference to a segment of any 
one of the number of contiguous positions selected from the group consisting of from 25 to 
600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence 
5 may be compared to a reference sequence of the same number of contiguous positions after 
the two sequences are optimally aligned. Methods of alignment of sequences for comparison 
are well-known in the art. Optimal alignment of sequences for comparison can be conducted, 
e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), 
by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), 
10 by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 

85:2444 (1988), by computerized implementations of these algorithms (CLUSTAL, GAP, 
BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 
Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection. 

15 In one aspect of the invention (in the methods of the invention, and, to 

determine if a sequence is within the scope of the invention), a CLUSTAL algorithm is used, 
e.g., the CLUSTAL W program, see, e.g., Thompson (1994) Nuc. Acids Res. 22:4673-4680; 
Higgins (1996) Methods Enzymol 266:383-402. Variations can also be used, such as 
CLUSTAL X, see Jeanmougin (1998) Trends Biochem Sci 23:403-405; Thompson (1997) 

20 Nucleic Acids Res 25:4876-4882. In one aspect, the CLUSTAL W program described by 
Thompson (1994) supra, is used with the following parameters: K tuple (word) size: 1, 
window size: 5, scoring method: percentage, number of top diagonals: 5, gap penalty: 3, to 
determine whether a nucleic acid has sufficient sequence identity to an exemplary sequence 
to be with the scope of the invention. In another aspect, the algorithm PILEUP is used in the 

25 methods and to determine whether a nucleic acid has sufficient sequence identity to be with 
the scope of the invention. This program creates a multiple sequence alignment from a group 
of related sequences using progressive, pairwise alignments to show relationship and percent 
sequence identity. It also plots a tree or dendogram showing the clustering relationships used 
to create the alignment. PILEUP uses a simplification of the progressive alignment method 

30 of Feng & Doolittle, J. Mol. Evol. 35:351-360 (1987). The method used is similar to the 
method described by Higgins & Sharp, CABIOS 5:151-153 (1989). Using PILEUP, a 
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reference sequence (e.g., an exemplary GCA-associated sequence of the invention) is 
compared to another sequence to determine the percent sequence identity relationship (i.e., 
that the second sequence is substantially identical and within the scope of the invention) 
using the following parameters: default gap weight (3.00), default gap length weight (0.10), 
5 and weighted end gaps. In one embodiment, PILEUP obtained from the GCG sequence 

analysis software package, e.g., version 7.0 (Devereaux(1984) Nuc. Acids Res. 12:387-395), 
using the parameters described therein, is used in the methods and to identify nucleic acids 
within the scope of the invention. In a another aspect, a BLAST algorithm is used (in the 
methods, e.g., to determine percent sequence identity (i.e., substantial similarity or identity) 

10 and whether a nucleic acid is within the scope of the invention), see, e.g., Altschul (1990) J. 
Mol Biol. 215:403-410. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information, NIH. This algorithm involves 
first identifying high scoring sequence pairs (HSPs) by identifying short words of length W 
in the query sequence, which either match or satisfy some positive-valued threshold score T 

15 when aligned with a word of the same length in a database sequence. T is referred to as the 
neighborhood word score threshold (Altschul (1990) supra). These initial neighborhood 
word hits act as seeds for initiating searches to find longer HSPs containing them. The word 
hits are then extended in both directions along each sequence for as far as the cumulative 
alignment score can be increased. Cumulative scores are calculated using, for nucleotide 

20 sequences, the parameters M (reward score for a pair of matching residues; always > 0) and 
N (penalty score for mismatching residues, always < 0). For amino acid sequences, a scoring 
matrix is used to calculate the cumulative score. Extension of the word hits in each direction 
are halted when: the cumulative alignment score falls off by the quantity X from its 
maximum achieved value; the cumulative score goes to zero or below, due to the 

25 accumulation of one or more negative-scoring residue alignments; or the end of either 
sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. In one embodiment, to determine if a nucleic acid 
sequence is within the scope of the invention, the BLASTN program (for nucleotide 
sequences) is used incorporating as defaults a wordlength (W) of 1 1, an expectation (E) of 

30 1 0, M=5, N=4, and a comparison of both strands. For amino acid sequences, the BLASTP 
program uses as default parameters a wordlength (W) of 3, an expectation (E) of 10, and the 
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BLOSUM62 scoring matrix (see, e.g., Henikoff (1989) Proc. Natl. Acad. Sci. USA 
89:10915). 

Hybridization for Identifying Nucleic Acids of the Invention 

Nucleic acids within the scope of the invention include isolated or 
recombinant nucleic acids that specifically hybridize under stringent hybridization conditions 
to an exemplary nucleic acid of the invention (including a sequence encoding an exemplary 
polypeptide) as set forth in Tables 3 and 4. Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is found in, 
e.g., Tijssen (1993) infra. Generally, stringent conditions are selected to be about 5 to 10°C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic 
strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic 
acid concentration) at which 50% of the probes complementary to the target hybridize to the 
target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of 
the probes are occupied at equilibrium). Stringent conditions will be those in which the salt 
concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion 
concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for 
short probes (e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater 
than 50 nucleotides). Stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide. 

For selective or specific hybridization, a positive signal (e.g., identification of 
a nucleic acid of the invention) is about 10 times background hybridization. "Stringent" 
hybridization conditions that are used to identify substantially identical nucleic acids within 
the scope of the invention include hybridization in a buffer comprising 50% formamide, 5x 
SSC, and 1% SDS at 42°C, or hybridization in a buffer comprising 5x SSC and 1% SDS at 
65°C, both with a wash of 0.2x SSC and 0.1% SDS at 65°C. Exemplary "moderately 
stringent hybridization conditions'* include a hybridization in a buffer of 40% formamide, 1 
M NaCl, and 1% SDS at 37°C, and a wash in IX SSC at 45°C. Those of ordinary skill will 
readily recognize that alternative but comparable hybridization and wash conditions can be 
utilized to provide conditions of similar stringency. Nucleic acids which do not hybridize to 
each other under stringent hybridization conditions are still substantially identical if the 
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polypeptides which they encode are substantially identical. This may occur, e.g. , when a 
copy of a nucleic acid is created using the maximum codon degeneracy permitted by the 
genetic code, as discussed herein (see discussion on "conservative substitutions"). However, 
the selection of a hybridization format is not critical - it is the stringency of the wash 
conditions that set forth the conditions that determine whether a nucleic acid is within the 
scope of the invention. Wash conditions used to identify nucleic acids within the scope of 
the invention include, e.g.: a salt concentration of about 0.02 molar at pH 7 and a temperature 
of at least about 50°C or about 55°C to about 60°C; or, a salt concentration of about 0. 1 5 M 
NaCl at 72°C for about 1 5 minutes; or, a salt concentration of about 0.2X SSC at a 
temperature of at least about 50°C or about 55°C to about 60°C for about 15 to about 20 
minutes; or, the hybridization complex is washed twice with a solution with a salt 
concentration of about 2X SSC containing 0.1% SDS at room temperature for 1 5 minutes 
and then washed twice by 0.1X SSC containing 0.1% SDS at 68°C for 15 minutes; or, 
equivalent conditions. See Sambrook, Tijssen and Ausubel (see below) for a description of 
SSC buffer and equivalent conditions. 

General Techniques 

The nucleic acid and polypeptide sequences of the invention and other nucleic 
acids used to practice this invention, whether RNA, cDNA, genomic DNA, vectors, viruses 
or hybrids thereof, may be isolated from a variety of sources, genetically engineered, 
amplified, and/or expressed recombinantly. Any recombinant expression system can be 
used, including, in addition to bacterial cells, e.g., mammalian, yeast, insect or plant cell 
expression systems. 

Alternatively, these nucleic acids and polypeptides can be synthesized in vitro 
by well-known chemical synthesis techniques, as described in, e.g., Carruthers (1982) Cold 
Spring Harbor Symp. Quant. Biol. 47:411-418; Adams (1983) J. Am. Chem. Soc. 105:661; 
Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. Biol. Med. 
19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 
68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett. 22:1859; U.S. 
Patent No. 4,458,066. 

Techniques for the manipulation of nucleic acids, such as, e.g., generating 
mutations in sequences, subcloning, labeling probes, sequencing, hybridization and the like 
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are well described in the scientific and patent literature, see, e.g., Sambrook, ed., 
Molecular Cloning: a Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring Harbor 
Laboratory, (1989); Current Protocols in Molecular Biology, Ausubel, ed. John Wiley 
& Sons, Inc., New York (1 997); Laboratory Techniques in Biochemistry and 
5 Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and 
Nucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993). 

Polypeptides and peptides of the invention can also be synthesized, whole or 
in part, using chemical methods well known in the art. See e.g., Caruthers (1980) Nucleic 
Acids Res. Symp. Ser. 215-223; Horn (1980) Nucleic Acids Res. Symp. Ser. 225-232; 

10 Banga, A.K., Therapeutic Peptides and Proteins, Formulation, Processing and Delivery 

Systems (1995) Technomic Publishing Co., Lancaster, PA. For example, peptide synthesis 
can be performed using various solid-phase techniques (see e.g., Roberge (1995) Science 
269:202; Merrifield (1997) Methods Enzymol. 289:3-13) and automated synthesis may be 
achieved, e.g., using the ABI 43 1 A Peptide Synthesizer (Perkin Elmer) in accordance with 

15 the instructions provided by the manufacturer. 

The skilled artisan will recognize that individual synthetic residues and 
polypeptides incorporating mimetics can be synthesized using a variety of procedures and 
methodologies, which are well described in the scientific and patent literature, e.g., Organic 
Syntheses Collective Volumes, Gilman, et al. (Eds) John Wiley & Sons, Inc., NY. 

20 Polypeptides incorporating mimetics can also be made using solid phase synthetic 

procedures, as described, e.g., by Di Marchi, et al., U.S. Pat. No. 5,422,426. Peptides and 
peptide mimetics of the invention can also be synthesized using combinatorial 
methodologies. Various techniques for generation of peptide and peptidomimetic libraries 
are well known, and include, e.g., multipin, tea bag, and split-couple-mix techniques; see, 

25 e.g., al-Obeidi (1998) Mol. Biotechnol. 9:205-223; Hruby (1997) Curr. Opin. Chem. Biol. 
1:1 14-1 19; Ostergaard (1997) Mol. Divers. 3:17-27; Ostresh (1996) Methods Enzymol. 
267:220-234. Modified peptides of the invention can be further produced by chemical 
modification methods, see, e.g., Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel 
(1995) Free Radic. Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896. 

30 Peptides and polypeptides of the invention can also be synthesized and 

expressed as fusion proteins with one or more additional domains linked thereto for, e.g., 
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producing a more immunogenic peptide, to more readily isolate a recombinantly synthesized 
peptide, to identify and isolate antibodies and antibody-expressing B cells, and the like. 
Detection and purification facilitating domains include, e.g., metal chelating peptides such as 
polyhistidine tracts and histidine-tryptophan modules that allow purification on immobilized 
5 metals, protein A domains that allow purification on immobilized immunoglobulin, and the 
domain utilized in the FLAGS extension/affinity purification system (hnmunex Corp, Seattle 
WA). The inclusion of a cleavable linker sequences such as Factor Xa or enterokinase 
(Invitrogen, San Diego CA) between the purification domain and GC A-associated peptide or 
polypeptide can be useful to facilitate purification. For example, an expression vector can 

10 include an epitope-encoding nucleic acid sequence linked to six histidine residues followed 
by a thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 
34:1787-1797; Dobeli (1998) Protein Expr. Purif. 12:404-414). The histidine residues 
facilitate detection and purification while the enterokinase cleavage site provides a means for 
purifying the epitope from the remainder of the fusion protein. Technology pertaining to 

15 vectors encoding fusion proteins and application of fusion proteins are well described in the 
scientific and patent literature, see e.g., Kroll (1993) DNA Cell. Biol., 12:441-53. 

The invention provides antibodies that specifically bind to the polypeptides of 
the invention, as set forth in Table 4. These antibodies can be useful in the screening 
methods of the invention. The polypeptides or peptide can be conjugated to another 

20 molecule or can be administered with an adjuvant. The coding sequence can be part of an 
expression cassette or vector capable of expressing the immunogen in vivo, (see, e.g., 
Katsumi (199j4) Hum. Gene Ther. 5:1 335-9). Methods of producing polyclonal and 
monoclonal antibodies are known to those of skill in the art and described in the scientific 
and patent literature, see, e.g., Coligan, Current Protocols in Immunology, 

25 Wiley/Greene, NY (1991); Stites (eds.) Basic and Clinical Immunology (7th ed.) Lange 
Medical Publications, Los Altos, CA; Goding, Monoclonal Antibodies: Principles and 
Practice (2d ed.) Academic Press, New York, NY (1986); Harlow (1988) Antibodies, a 
Laboratory Manual, Cold Spring Harbor Publications, New York. 

Antibodies also can be generated in vitro, e.g., using recombinant antibody 

30 binding site expressing phage display libraries, in addition to the traditional in vivo methods 
using animals. See, e.g., Huse (1989) Science 246:1275; Ward (1989) Nature 341:544; 
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Hoogenboom (1997) Trends Biotechnol. 15:62-70; Katz (1997) Annu. Rev. Biophys. 
Biomol. Struct. 26:27 '-45. Human antibodies can be generated in mice engineered to produce 
only human antibodies, as described by, e.g., U.S. Patent No. 5,877,397; 5,874,299; 
5,789,650; and 5,939,598. B-cells from these mice can be immortalized using standard 
5 techniques (e.g., by fusing with an immortalizing cell line such as a myeloma or by 
manipulating such B-cells by other techniques to perpetuate a cell line) to produce a 
monoclonal human antibody-producing cell. See, e.g., U.S. Patent No. 5,916,771; 5,985,615. 

TABLE 3 

>Rv0002 dnaN DNA polymerase III, b-subunit TB.seq 2052:3257 MW:421 14 

10 >emb|AL123456|MTBH37RV:2052-3260, dnaN SEQ ID NO:1 

ATGGACGCGGCTACGACAAGAGTT6GCCTCACCGACTTGACGTTTCGTTTGCTACGAGAGTCTT 
TCGCCGATGCGGTGTCGTGGGTGGCTAAAAATCTGCCAGCCAGGCCCGCGGTGCCGGTGCTCT 
CCGGCGTGTTGTTGACCGGCTCGGACAACGGTCTGACGATTTCCGGATTCGACTACGAGGTTTC 
CGCCGAGGCCCAGGTTGGCGCTGAAATTGTTTCTCCTGGAAGCGTTTTAGTTTCTGGCCGATTG 

15 TTGTCCGATATTACCCGGGCGTTGCCTAACAAGCCCGTAGACGTTCATGTCGAAGGTAACCGGG 
TCGCATTGACCTGCGGTAACGCCAGGTTTTCGCTACCGACGATGCCAGTCGAGGATTATCCGAC 
GCTGCCGACGCTGCCGGAAGAGACCGGATTGTTGCCTGCGGAATTATTCGCCGAGGCAATCAG 
TCAGGTCGCTATCGCCGCCGGCCGGGACGACACGTTGCCTATGTTGACCGGCATCCGGGTCGA 
AATCCTCGGTGAGACGGTGGTTTTGGCCGCTACCGACAGGTTTCGCCTGGCTGTTCGAGAACTG 

20 AAGTGGTCGGCGTCGTCGCCAGATATCGAAGCGGCTGTGCTGGTCCCGGCCAAGACGCTGGC 
CGAGGCCGCCAAAGCGGGCATCGGCGGCTCTGACGTTCGTTTGTCGTTGGGTACTGGGCCGG 
GGGTGGGCAAGGATGGCCTGCTCGGTATCAGTGGGAACGGCAAGCGCAGCACCACGCGACTT 
CTTGATGCCGAGTTCCCGAAGTTTCGGCAGTTGCTACCAACCGAACACACCGCGGTGGCCACC 
ATGGACGTGGCCGAGTTGATCGAAGCGATCAAGCTGGTTGCGTTGGTAGCTGATCGGGGCGCG 

25 CAGGTGCGCATGGAGTTCGCTGATGGCAGCGTGCGGCTTTCTGCGGGTGCCGATGATGTTGGA 
CGAGCCGAGGAAGATCTTGTTGTTGACTATGCCGGTGAACCATTGACGATTGCGTTTAACCCAA 
CCTATCTAACGGACGGTTTGAGTTCGTTGCGCTCGGAGCGAGTGTCTTTCGGGTTTACGACTGC 
GGGTAAGCCTGCCTTGCTACGTCCGGTGTCCGGGGACGATCGCCCTGTGGCGGGTCTGAATGG 
CAACGGTCCGTTCCCGGCGGTGTCGACGGACTATGTCTATCTGTTGATGCCGGTTCGGTTGCCG 

30 GGCTGA 

>Rv0003 recF DNA replication and SOS induction TB.seq 3280:4434 MW:421 81 
>emb|AL123456|MTBH37RV:3280-4437, recF SEQ ID NO:2 

GTGTACGTCCGTCATTTGGGGCTGCGTGACTTCCGGTCCTGGGCATGTGTAGATCTGGAATTGC 
35 ATCCAGGGCGGACGGTTTTTGTTGGGCCTAACGGTTATGGTAAGACGAATCTTATTGAGGCACT 
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GTGGTATTCGACGACGTTAGGTTCGCACCGCGTTAGCGCCGATTTGCCGTTGATCCGGGTAGGT 

ACCGATCGTGCGGTGATCTCCACGATCGTGGTGAACGACGGTAGAGAATGTGCCGTCGACCTC 

GAGATCGCCACGGGGCGAGTCAACAAAGCGCGATTGAATCGATCATCGGTCCGAAGTACACGT 

GATGTGGTCGGAGTGCTTCGAGCTGTGTTGTTTGCCCCTGAGGATCTGGGGTTGGTTCGTGGG 

GATCCCGCTGACCGGCGGCGCTATCTGGATGATCTGGCGATCGTGCGTAGGCCTGCGATCGCT 

GCGGTACGAGCCGAATATGAGAGGGTGTTGCGCCAGCGGACGGCGTTATTGAAGTCCGTACCT 

GGAGCACGGTATCGGGGTGACCGGGGTGTGTTTGACACTCTTGAGGTATGGGACAGTCGTTTG 

GCGGAGCACGGGGCTGAACTGGTGGCCGCCCGCATCGATTTGGTCAACCAGTTGGCACCGGA 

AGTGAAGAAGGCATACCAGCTGTTGGCGCCGGAATCGCGATCGGCGTCTATCGGTTATCGGGC 

CAGCATGGATGTAAeCGGTCCCAGCGAGCAGTCAGATATCGATCGGCAATTGTTAGCAGCTCGG 

CTGTTGGCGGCGCTGGCGGCCCGTCGGGATGCCGAACTCGAGCGTGGGGTTTGTCTAGTTGGT 

CCGCACCGTGACGACCTAATACTGCGACTAGGCGATCAACCCGCGAAAGGATTTGCTAGCCATG 

GGGAGGCGTGGTCGTTGGCGGTGGCACTGCGGTTGGCGGCCTATCAACTGTTACGCGTTGATG 

GTGGTGAGCCGGTGTTGTTGCTCGACGACGTGTTCGCCGAACTGGATGTCATGCGCCGTCGAG 

CGTTGGCGACGGCGGCCGAGTCCGCCGAACAGGTGTTGGTGACTGCCGCGGTGCTCGAGGAT 

ATTCCCGCCGGCTGGGACGCCAGGCGGGTGCACATCGATGTGCGTGCCGATGACACCGGATC 

GATGTCGGTGGTTCTGCCATGA 

>Rv0005 gyrB DNA gyrase subunit B TB.seq ,51 23:7264 MW:78441 
>emb|AL123456|MTBH37RV:5123-7267, gyrB SEQ ID NO:3 

ATGGGTAAAAACGAGGCCAGAAGATCGGCCCTGGCGCCCGATCACGGTACAGTGGTGTGCGAC 
CCCCTGCGGCGACTCAACCGCATGCACGCAACCCCTGAGGAGAGTATTCGGATCGTGGCTGCC 
CAGAAAAAGAAGGCCCAAGACGAATACGGCGCTGCGTCTATCACCATTCTCGAAGGGCTGGAG 
GCCGTCCGCAAACGTCCCGGCATGTACATTGGCTCGACCGGTGAGCGCGGTTTACACCATCTC 
ATTTGGGAGGTGGTCGACAACGCGGTCGACGAGGCGATGGCCGGTTATGCAACCACAGTGAAC 
GTAGTGCTGCTTGAGGATGGCGGTGTCGAGGTCGCCGACGACGGCCGCGGCATTCCGGTCGC 
CACCCACGCCTCCGGCATACCGACCGTCGACGTGGTGATGACACAACTACATGCCGGCGGCAA 
GTTCGACTCGGACGCGTATGCGATATCTGGTGGTCTGCACGGCGTCGGCGTGTCGGTGGTTAA 
CGCGCTATCCACCCGGCTCGAAGTCGAGATCAAGCGCGACGGGTACGAGTGGTCTCAGGTTTA 
TGAGAAGTCGGAACCCCTGGGCCTCAAGCAAGGGGCGCCGACCAAGAAGACGGGGTCAACGG 
TGCGGTTCTGGGCCGACCCCGCTGTTTTCGAAACCACGGAATACGACTTCGAAACCGTCGCCC 
GCCGGCTGCAAGAGATGGCGTTCCTCAACAAGGGGCTGACCATCAACCTGACCGACGAGAGGG 
TGACCCAAGACGAGGTCGTCGACGAAGTGGTCAGCGACGTCGCCGAGGCGCCGAAGTCGGCA 
AGTGAACGCGCAGCCGAATCCACTGCACCGCACAAAGTTAAGAGCCGCACCTTTCACTATCCGG 
GTGGCCTGGTGGACTTCGTGAAACACATCAACCGCACCAAGAACGCGATTCATAGCAGCATCGT 
GGACTTTTCCGGCAAGGGCACCGGGCACGAGGTGGAGATCGCGATGCAATGGAACGCCGGGT 
ATTCGGAGTCGGTGCACACCTTCGCCAACACCATCAACACCCACGAGGGCGGCACCCACGAAG 
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AGGGCTTCCGCAGCGCGCTGACGTCGGTGGTGAACAAGTACGCCAAGGACCGCAAGCTACTGA 
AGGACAAGGACCCCAACCTCACCGGTGACGATATCCGGGAAGGCCTGGCCGCTGTGATCTCGG 
TGAAGGTCAGCGAACCGCAGTTCGAGGGCCAGACCAAGACCAAGTTGGGCAACACCGAGGTCA 
AATCGTTTGTGCAGAAGGTCTGTAACGAACAGCTGACCCACTGGTTTGAAGCCAACCCCACCGA 
5 CGCGAAAGTCGTTGTGAACAAGGCTGTGTCCTCGGCGCAAGCCCGTATCGCGGCACGTAAGGC 
ACGAGAGTTGGTGCGGCGTAAGAGCGCCACCGACATCGGTGGATTGCCCGGCAAGCTGGCCG 
ATTGCCGTTCCACGGATCCGCGCAAGTCCGAACTGTATGTCGTAGAAGGTGACTCGGCCGGCG 
GTTCTGCAAAAAGCGGTCGCGATTCGATGTTCCAGGCGATACTTCCGCTGCGCGGCAAGATCAT 
CAATGTGGAGAAAGCGCGCATCGACCGGGTGCTAAAGAACACCGAAGTTCAGGCGATCATCAC 

10 GGCGCTGGGCACCGGGATCCACGACGAGTTCGATATCGGCAAGCTGCGCTACCACAAGATCGT 
GCTGATGGCCGACGCCGATGTTGACGGCCAACATATTTCCACGCTGTTGTTGACGTTGTTGTTC 
CGGTTCATGCGGCCGCTCATCGAGMCGGGCATGTG7TTTTGGCACAACCGCCGCTGTACAAAC 
TCAAGTGGCAGCGCAGTGACCCGGAATTCGCATACTCCGACCGCGAGCGCGACGGTCTGCTGG 
AGGCGGGGCTGAAGGCCGGGAAGAAGATCAACAAGGAAGACGGCATTCAGCGGTACAAGGGT 

15 CTAGGTGAAATGGACGCTAAGGAGTTGTGGGAGACCACCATGGATCCCTCGGTTCGTGTGTTGC 
GTCAAGTGACGCTGGACGACGCCGCCGCCGCCGACGAGTTGTTCTCCATCCTGATGGGCGAGG 
ACGTCGAGGCGCGGCGCAGCTTTATCACCCGCAACGCCAAGGATGTTCGGTTCCTGGATGTCTA 
A 

20 >Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 
>emb|AL123456|MTBH37RV:7302-9818, gyrA SEQ ID NO:4 

ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGG7TGACATCGAG 

CAGGAGATGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCG 

GAGGTGCGCGACGGGCTCAAGCCCGTGCATCGCCGGGTGCTCTATGCAATGTTCGATTCCGGC 

25 TTCCGCCCGGACCGCAGCCACGCCAAGTCGGCCCGGTCGGTTGCCGAGACCATGGGCAACTA 
CCACCCGCACGGCGACGCGTCGATCTACGACAGCCTGGTGCGCATGGCCCAGCCCTGGTCGC 
TGCGCTACCCGCTGGTGGACGGCCAGGGCAACTTCGGCTCGCCAGGCAATGACCCACCGGCG 
GCGATGAGGTACACCGAAGCCCGGCTGACCCCGTTGGCGATGGAGATGCTGAGGGAAATCGAC 
GAGGAGACAGTCGATTTCATCCCTAACTACGACGGCCGGGTGCAAGAGCCGACGGTGCTACCC 

30 AGCCGGTTCCCCAACCTGCTGGCCAACGGGTCAGGCGGCATCGCGGTCGGCATGGCAACCAAT 
ATCCCGCCGCACAACCTGCGTGAGCTGGCCGACGCGGTGTTCTGGGCGCTGGAGAATCACGAC 
GCCGACGAAGAGGAGACCCTGGCCGCGGTCATGGGGCGGGTTAAAGGCCCGGACTTCCCGAC 
CGCCGGACTGATCGTCGGATCCCAGGGCACCGCTGATGCCTACAAAACTGGCCGCGGCTCCAT 
TCGAATGCGCGGAGTTGTTGAGGTAGAAGAGGATTCCCGCGGTCGTACCTCGCTGGTGATCAC 

35 CGAGTTGCCGTATCAGGTCAACCACGACAACTTCATCACTTCGATCGCCGAACAGGTCCGAGAC 
GGCAAGCTGGCCGGCATTTCCAACATTGAGGACCAGTCTAGCGATCGGGTCGGTTTACGCATC 
GTCATCGAGATCAAGCGCGATGCGGTGGCCAAGGTGGTGATCAATAACCTTTACAAGCACACCC 
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AGCTGCAGACCAGCTTTGGCGCCAACATGCTAGCGATCGTCGACGGGGTGCCGCGCACGCTGC 
GGCTGGACCAGCTGATCCGCTATTACGTTGACCACCAACTCGACGTCATTGTGCGGCGCACCAC 
CTACCGGCTGCGCAAGGCAAACGAGCGAGCCCACATTCTGCGCGGCCTGGTTAAAGCGCTCGA 
CGCGCTGGACGAGGTCATTGCACTGATCCGGGCGTCGGAGACCGTCGATATCGCCCGGGCCG 
5 GACTGATCGAGCTGCTCGACATCGACGAGATCCAGGCCCAGGCAATCCTGGACATGCAGTTGC 
GGCGCCTGGCCGCACTGGAACGCCAGCGCATCATCGACGACCTGGCCAAAATCGAGGCCGAG 
ATCGCCGATCTGGAAGACATCCTGGCAAAACCCGAGCGGCAGCGTGGGATCGTGCGCGACGAA 
CTCGCCGAAATCGTGGACAGGCACGGCGACGACCGGCGTACCCGGATCATCGCGGCCGACGG 
AGACGTCAGCGACGAGGATTTGATCGCCCGCGAGGACGTCGTTGTCACTATCACCGAAACGGG 

10 ATACGCCAAGCGCACCAAGACCGATCTGTATCGCAGCCAGAAACGCGGCGGCAAGGGCGTGCA 
GGGTGCGGGGTTGAAGCAGGACGACATCGTCGCGCACTTCTTCGTGTGCTCCACCCACGATTT 
GATCCTGTTCTTCACCACCCAGGGACGGGTTTATCGGGCCAAGGCCTACGACTTGCCCGAGGC 
CTCCCGGACGGCGCGCGGGCAGCACGTGGCCAACCTGTTAGCCTTCCAGCCCGAGGAACGCA 
TCGCCCAGGTCATCCAGATTCGCGGCTACACCGACGCCCCGTACCTGGTGCTGGCCACTCGCA 

15 ACGGGCTGGTGAAAAAGTCCAAGCTGACCGACTTCGACTCCAATCGCTCGGGCGGAATCGTGG 
CGGTCAACCTGCGCGACAACGACGAGCTGGTCGGTGCGGTGCTGTGTTCGGCCGGCGACGAC 
CTGCTGCTGGTCTCGGCCAACGGGCAGTCCATCAGGTTCTCGGCGACCGACGAGGCGCTGCG 
GCCAATGGGTCGTGCCACCTCGGGTGTGCAGGGCATGCGGTTCAATATCGACGACCGGCTGCT 
GTCGCTGAACGTCGTGCGTGAAGGCACCTATCTGCTGGTGGCGACGTCAGGGGGCTATGCGAA 

20 ACGTACCGCGATCGAGGAATACCCGGTACAGGGCCGCGGCGGTAAAGGTGTGCTGACGGTCAT 
GTACGACCGCCGGCGCGGCAGGTTGGTTGGGGCGTTGATTGTCGACGACGACAGCGAGCTGT 
ATGCCGTCACTTCCGGCGGTGGCGTGATCCGCACCGCGGCACGCCAGGTTCGCAAGGCGGGA 
CGGCAGACCAAGGGTGTTCGGTTGATGAATCTGGGCGAGGGCGACACACTGTTGGCCATCGCG 
CGCAACGCCGAAGAAAGTGGCGACGATAATGCCGTGGACGCCAACGGCGCAGACCAGACGGG 

25 CAATTAA 

>Rv0014c pknB serine-threonine protein kinase TB.seq 15593:17470 MW:6651 1 
>emb|AL123456|MTBH37RV:c17470-15590, pknB SEQ ID NO:5 

ATGACCACCCCTTCCCACCTGTCCGACCGCTACGMCTTGGCGAAATCCTTGGATTTGGGGGCA 
30 TGTCCGAGGTCCACCTGGCCCGCGACCTCCGGTTGCACCGCGACGTTGCGGTCAAGGTGCTGC 
GCGCTGATCTAGCCCGCGATCCCAGTTTTTACCTTCGCTTCCGGCGTGAGGCGCAAAACGCCG 
CGGCATTGAACCACCCTGCAATCGTCGCGGTCTACGACACCGGTGAAGCCGAAACGCCCGCCG 
GGCCATTGCCCTACATCGTCATGGAATACGTCGACGGCGTTACCCTGCGCGACATTGTCCACAC 
CGAAGGGCCGATGACGCCCAAACGCGCCATCGAGGTCATCGCCGACGCCTGCCAAGCGCTGA 
35 ACTTCAGTCATCAGAACGGAATCATCCACCGTGACGTCAAGCCGGCGAACATCATGATCAGCGC 
GACCAATGCAGTAAAGGTGATGGATTTCGGCATCGCCCGCGCCATTGCCGACAGCGGCAACAG 
CGTGACCCAGACCGCAGCAGTGATCGGCACGGCGCAGTACCTGTCACCCGAACAGGCCCGGG 
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GTGATTCCGTCGACGCCCGATCCGATGTCTATTCCTTGGGCTGTGTTCTTTATGAAGTCCTCACC 

GGGGAGCCACCTTTCACCGGCGACTCACCCGTCTCGGTTGCCTACCAACATGTGCGCGAAGAC 

CCGATCCCACCTTCGGCGCGGCACGAAGGCCTCTCCGCCGACCTGGACGCCGTCGTTCTCAAG 

GCGCTGGCCAAAAATCCGGAAAACCGCTATCAGACAGCGGCGGAGATGCGCGCCGACCTGGTC 

CGCGTGCACAACGGTGAGCCGCCCGAGGCGCCCAAAGTGCTCACCGATGCCGAGCGGACCTC 

GCTGCTGTCGTCTGCGGCCGGCAACCTTAGCGGTCCGCGCACCGATCCGCTACCACGCCAGGA 

CTTAGACGACACCGACCGTGACCGCAGCATCGGTTCGGTGGGCCGTTGGGTTGCGGTGGTCGC 

CGTGCTCGCTGTGCTGACCGTCGTGGTAACCATCGCCATCAACACGTTCGGCGGCATCACCCG 

CGACGTTCAAGTTCCCGACGTTCGGGGTCAATCCTCCGCCGACGCCATCGCCACACTGCAAAA 

CCGGGGCTTCAAAATCCGCACCTTGCAGAAGCCGGACTCGACAATCCCACCGGACCACGTTAT 

CGGCACCGACCCGGCCGCCAACACGTCGGTGAGTGCAGGCGACGAGATCACAGTCAACGTGT 

CCACCGGACCCGAGCAACGCGAAATACCCGACGTCTCCACGCTGACATACGCCGAAGCGGTCA 

AGAAACTGACTGCCGCCGGATTCGGCCGCTTCAAGCAAGCGAATTCGCCGTCCACCCCGGAAC 

TGGTGGGCAAGGTCATCGGGACCAACCCGCCAGCCAACCAGACGTCGGCCATCACCAATGTGG 

TCATCATCATCGTTGGCTCTGGTCCGGCGACCAAAGACATTCCCGATGTCGCGGGCCAGACCGT 

CGACGTGGCGCAGAAGAACCTCAACGTCTACGGCTTCACCAAATTCAGTCAGGCCTCGGTGGA 

CAGCCCCCGTCCCGCCGGCGAGGTGACCGGCACCAATCCACCCGCAGGCACCACAGTTCCGG 

TCGATTCAGTCATCGAACTACAGGTGTCCAAGGGCAACCAATTCGTCATGCCCGACCTATCCGG 

CATGTTCTGGGTCGACGCCGAACCACGATTGCGCGCGCTGGGCTGGACCGGGATGCTCGACAA 

AGGGGCCGACGTCGACGCCGGTGGCTCCCAACACAACCGGGTCGTCTATCAAAACCCGCCGG 

CGGGGACCGGCGTCAACCGGGACGGCATCATCACGCTGAGGTTCGGCCAGTAG 

>Rv001 6c pbpA TB.seq 1 8762:20234 MW:51 577 
>emb|AL123456|MTBH37RV:c20234-18759, pbpA SEQ ID NO:6 

ATGAACGCCTCTCTGCGCCGAATATCGGTGACCGTGATGGCGTTGATCGTGTTGCTACTGCTCA 

ACGCGACCATGACGCAGGTCTTCACCGCCGACGGGCTGCGTGCCGATCCCCGCAACCAGCGA 

GTGTTGCTCGACGAGTATTCACGGCAGCGCGGCCAGATCACCGCTGGTGGCCAACTGCTGGCG 

TACTCGGTAGCCACCGACGGCCGCTTTCGTTTCCTGCGGGTCTATCCCAATCCTGAGGTGTACG 

CGCCGGTTACCGGCTTCTACTCCCTGCGCTATTCCAGCACCGCCCTAGAACGAGCCGAGGACC 

CGATATTGAACGGGTCCGACCGCCGTCTGTTCGGCCGCCGGCTGGCCGACTTCTTCACCGGTC 

GCGACCCACGCGGCGGTAATGTCGATACCACGATCAACCCGCGCATTCAGCAAGCCGGCTGGG 

ACGCGATGCAGCAAGGCTGCTACGGGCCCTGTAAGGGAGCGGTGGTCGCCCTTGAGCCATCAA 

CCGGCAAGATTTTGGCGTTGGTGTCTTCTCCGTCCTACGACCCCAACCTGCTGGCGTCGCATAA 

CCCCGAGGTGCAGGCGCAAGCCTGGCAGCGGCTTGGCGACAATCCCGCCTCTCCACTGACCAA 

CCGTGCCATCTCTGAGACGTATCCACCGGGTTCGACTTTCAAAGTGATCACCACTGCGGCCGCG 

CTGGCCGCCGGGGCCACCGAGACCGAACAGCTGACTGCGGCGCCCACAATTCCGTTGCCAGG 

CAGCACCGCCCAGCTAGAGAACTACGGCGGTGCGCCGTGCGGGGACGAACCCACCGTGTCGC 
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TGCGTGAGGCATTCGTCAAATCATGCAACACCGCATTCGTCCAGCTGGGCATCCGCACCGGCG 

CCGACGCCCTGCGCAGCATGGCGCGCGCGTTCGGTCTCGATAGCCCACCGCGCCCAACTCCG 

CTGCAAGTGGCGGAATCAACCGTCGGGCCTATCCCGGACAGCGCCGCACTAGGGATGACCAGT 

ATCGGCCAAAAGGACGTTGCGCTGACCCCGCTAGCGAACGCAGAAATAGCCGCGACCATCGCA 

AACGGCGGCATTACGATGAGGCCTTATCTAGTCGGCAGCCTCAAGGGACCGGACCTAGCCAAT 

ATCTCAACCACCGTCGGATACCAGCAGCGCCGCGCGGTGTCACCGCAGGTCGCCGCTAAGCTA 

ACAGAGCTGATGGTCGGCGCCGAGAAAGTCGCACAGCAGAAAGGGGCAATCCCCGGCGTGCA 

GATCGCATCCAAGACGGGCACCGCCGAACATGGCACCGACCCTCGTCACACTCCACCGCACGC 

TTGGTACATCGCCTTTGCGCCCGCACAAGCGCCCAAGGTGGCTGTTGCCGTGCTGGTGGAGAA 

CGGGGCTGATCGGCTGTCCGCCACCGGAGGTGCCCTCGCGGCACCGATCGGGCGGGCGGTG 

ATCGAAGCCGCACTGCAGGGGGAACCATGA 

>Rv0017c rodA TB.seq 20234:21640 MW:50612 
>emb|AL123456|MTBH37RV:c21 640-20231, rodA SEQ ID NO:7 

ATGACGACACGACTGCAAGCGCCGGTGGCCGTAACGCCCCCGTTGCCGACTCGGCGCAACGC 

TGAACTGCTGCTGCTGTGCTTTGCCGCCGTAATCACGTTTGCCGCACTGCTGGTCGTGCAGGCC 

AATCAAGACCAGGGGGTGCCCTGGGACTTGACTAGCTACGGACTGGCCTTCCTGACCCTGTTC 

GGATCCGCGCATCTGGCCATCCGGCGCTTCGCCCCCTACACTGACCCGCTGTTGCTCCCGGTG 

GTGGCACTGCTCAACGGACTTGGCCTGGTAATGATCCACCGCCTCGATCTGGTGGACAACGAG 

ATCGGCGAGCATCGGCACCCCAGCGCAAACCAGCAGATGCTGTGGACGCTGGTGGGCGTAGC 

TGCCTTCGCGCTCGTGGTGACCTTCCTCAAGGACCACCGACAGCTCGCACGCTACGGCTACATT 

TGCGGGCTCGCGGGTCTGGTTTTCTTGGCAGTTCCCGCGCTGCTCCCGGCAGCACTGTCCGAA 

CAGAACGGCGCCAAGATCTGGATCCGGTTGCCCGGCTTCTCGATTCAACCCGCCGAATTTTCAA 

AGATTCTGCTGCTGATCTTCTTTTCGGCGGTACTGGTGGCCAAACGCGGCCTGTTCACCAGCGC 

CGGCAAACATTTGCTCGGAATGACCCTGCCGCGCCCGCGAGACCTCGCGCCACTGTTGGCAGC 

CTGGGTCATCTCGGTGGGTGTGATGGTCTTCGAGAAAGACCTCGGCGCTTCGCTGCTGCTGTAC 

ACATCGTTTCTGGTGGTGGTTTACCTCGCCACCCAGCGGTTCAGTTGGGTCGTCATCGGCCTGA 

CTCTGTTCGCGGCAGGAACCTTGGTGGCGTACTTCATTTTTGAGCACGTCCGGCTCCGCGTACA 

GACCTGGCTGGATCCGTTCGCAGATCCAGACGGCACCGGATATCAGATCGTGCAGTCGCTTTTC 

AGCTTCGCTACAGGCGGTATCTTCGGCACCGGGCTCGGTAATGGTCAACCCGACACCGTGCCC 

GCGGCATCCACCGATTTCATCATCGCCGCGTTCGGCGAAGAGCTTGGGTTGGTGGGCTTGACG 

GCCATCCTGATGCTCTACACCATCGTGATCATCCGGGGTTTGCGCACGGCCATCGCCACCCGC 

GATAGCTTCGGCAAGCTGCTGGCCGCCGGCCTCTCATCGACGCTAGCCATTCAGCTGTTCATCG 

TCGTCGGCGGTGTGACCCGACTCATTCCGCTGACCGGGTTGACCACACCGTGGATGTCCTACG 

GCGGGTCTTCACTGCTGGCCAACTACATATTGCTGGCCATCCTGGCACGCATCTCGCACGGAGC 

CCGCCGCCCACTGCGCACCCGCCCACGAAATAAGTCGCCGATTACGGCGGCCGGCACCGAGG 

TCATCGAACGCGTATGA 
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>Rv0018c ppp TB.seq 21640:23181 MW:53781 
>emb|AL123456|MTBH37RV:c23181-21637, ppp SEQ ID N0:8 

GTGGCGCGCGTGACCCTGGTCCTGCGATACGCGGCGCGCAGCGATCGCGGCTTGGTACGCGC 
5 CAACAACGAAGACTCGGTCTACGCTGGGGCACGGCTATTGGCCCTGGCCGACGGCATGGGTG 
GGCATGCGGCCGGCGAGGTGGCGTCCCAGTTGGTGATTGCCGCATTGGCCCATCTCGATGACG 
ACGAGCCCGGTGGCGATCTGCTGGCCAAGCTGGATGCCGCGGTGCGCGCCGGCAACTCGGCT 
ATCGCAGCGCAAGTCGAGATGGAGCCCGATCTCGAAGGCATGGGTACCACGCTCACCGCAATC 
CTGTTCGCGGGCAACCGGCTCGGCCTGGTGCATATCGGTGACTCGCGCGGTTACCTGCTGCGC 

10 GACGGTGAGCTGACGCAGATCACCAAGGACGACACGTTTGTCCAAACGCTGGTCGACGAAGGC 
CGGATCACCCCGGAGGAGGCGCACAGCCACCCGCAACGCTCGTTGATCATGCGGGCGTTGAC 
CGGCCATGAGGTCGAACCGACGCTGACCATGCGAGAAGCCCGCGCCGGTGATCGTTACCTGCT 
GTGCTCGGACGGGTTGTCCGATCCGGTTAGCGATGAAACTATCCTCGAGGCCCTGCAGATCCC 
CGAGGTTGCCGAGAGCGCTCACCGCCTCATTGAACTGGCGCTGCGCGGCGGCGGCCCCGACA 

15 ACGTCACTGTCGTCGTCGCCGACGTCGTCGACTACGACTACGGCCAGACCCAACCGATTCTGG 
CCGGGGCGGTCTCAGGCGACGACGACCAACTGACCCTGCCCAACACCGCCGCCGGCCGGGCC 
TCTGCCATCAGCCAGCGCAAGGAGATCGTTAAACGCGTTCCGCCACAGGCCGATACATTCAGTC 
GGCCACGGTGGTCGGGCCGACGGCTAGCATTCGTTGTCGCACTGGTGACCGTGCTGATGACTG 
CGGGCCTGCTCATTGGTCGCGCGATCATCCGCAGCAACTACTACGTAGCGGACTACGCCGGCA 

20 GCGTGTCCATCATGCGGGGGATTCAAGGGTCGCTACTGGGCATGTCCCTGCACCAGCCTTACC 
TGATGGGCTGCCTCAGCCCGCGTAACGAGCTGTCGCAGATCAGCTACGGACAGTCTGGGGGCC 
CTCTCGACTGCCATCTGATGAAACTGGAGGATCTGCGACCGCCGGAGCGCGCACAGGTTCGGG 
CCGGTCTCCCGGCCGGCACTCTCGATGACGCCATCGGGCAGTTGCGCGAACTGGCGGCCAACT 
CCCTGCTGCCGCCTTGCCCGGCGCCGCGTGCCACGTCCCCGCCCGGGCGCCCGGCCCCACCC 

25 ACCACCAGCGAGACAACCGAACCAAACGTCACCTCCTCGCCAGCCTCTCCATCACCCACCACCT 
CCGCGCCGGCCCCCACCGGAACTACTCCTGCCATCCCCACGAGTGCCTCCCCGGCAGCGCCC 
GCGTCGCCGCCGACGCCTTGGCCCGTCACCAGCTCGCCGACGATGGCCGCACTTCCGCCACC 
CCCGCCTCAGCCGGGCATCGACTGCCGGGCGGCGGCATGA 

30 >Rv0019c - TB.seq 23273:23737 MW:17153 

>emb|AL123456|MTBH37RV:c23737-23270. Rv0019c SEQ ID NO:9 

ATGCAGGGGTTGGTACTGCAACTGACGCGTGCCGGATTCTTGATGTTGTTGTGGGTATTCATCT 
GGTCCGTGCTACGGATCTTGAAGACCGACATTTATGCGCCGACCGGCGCGGTCATGATGCGCC 
GCGGCCTGGCGCTGCGAGGGACGCTCTTAGGCGCGCGTCAGCGCCGGCACGCTGCACGCTAC 
35 CTGGTGGTGACCGAAGGTGCGTTGACTGGCGCGCGTATCACGCTGAGCGAACAGCCGGTGTTG 
ATCGGGCGCGCCGACGACTCGACCCTGGTGCTGACCGACGACTACGCCTCGACGCGGCACGC 
TCGGCTGTCTATGCGCGGCTCCGAGTGGTACGTCGAAGATCTAGGATCGACCAACGGCACTTA 



WO 01/35317 



PCTYUSOO/31152 



CCTGGACAGGGCGAAGGTGACGACTGCGGTACGAGTTCCGATCGGAACGCCGGTTCGCATCG 
GCAAAACTGCAATCGAGTTGCGCCCGTGA 

>Rv0020c - TB.seq 23864:25444 MW:56881 
5 >emb|AL123456|MTBH37RV:c25444-23861,Rv0020c SEQIDNO:10 

ATGGGTAGCCAGAAAAGGCTGGTTCAGCGCGTTGAGCGCAAACTCGAGCAGACGGTTGGCGAT 
GCGTTTGCCCGCATCTTTGGAGGCTCGATCGTCCCGCAAGAGGTCGAAGCCCTGCTGCGCCGC 
GAGGCGGCCGACGGCATCCAGTCGCTGCAGGGAAATCGCCTTTTGGCGCCCAACGAATACATC 
ATTACCCTCGGTGTGCACGACTTTGAGAAGTTGGGCGCTGATCCTGAGCTGAAGTCAACCGGTT 

10 TTGCTCGGGACTTGGCGGACTATATCCAAGAACAGGGGTGGCAAACGTATGGTGATGTGGTCGT 
CCGATTCGAGCAGTCGTCGAACCTGCATACCGGCCAGTTCCGCGCCCGCGGCACTGTTAACCC 
CGACGTTGAGACCCACCCGCCGGTCATCGATTGCGCCCGGCCACAATCAAACCACGCGTTTGG 
CGCAGAACCAGGAGTAGCACCAATGAGTGACAATTCGAGCTACCGTGGCGGTCAGGGGCAGGG 
GCGTCCCGACGAGTATTACGACGACCGCTATGCGCGTCCGCAAGAGGATCCGCGTGGTGGCCC 

15 GGATCCGCAAGGCGGATCTGACCCCCGCGGGGGGTATCCACCCGAGACGGGCGGCTACCCGC 
CCCAGCCGGGCTACCCACGCCCGCGCCACCCGGACCAGGGCGACTACCCCGAGCAAATCGGG 
TACCCCGACCAGGGCGGTTACCCCGAGCAACGCGGTTACCCCGAGCAACGCGGCTACCCCGA 
CCAGCGCGGGTACCAGGACCAGGGTCGAGGCTACCCCGACCAAGGGCAGGGGGGCTATCCGC 
CGCCCTACGAGCAACGCCCTCCTGTTTCTCCCGGCCCGGCTGCCGGCTACGGCGCTCCCGGCT 

20 ACGACCAGGGCTATCGCCAAAGCGGCGGCTACGGCCCTTCACCCGGTGGCGGCCAGCCCGGC 
TACGGCGGGTACGGGGAGTACGGGCGTGGCCCGGCTCGCCACGAGGAGGGCAGCTATGTGCC 
CTCTGGCCCTCCGGGCCCGCCCGAGCAACGACCGGCTTACCCCGACCAAGGCGGTTACGACC 
AGGGCTACCAGCAAGGCGCCACGACATACGGCCGGCAAGACTATGGCGGCGGCGCTGACTAC 
ACCCGCTACACCGAATCCCCGCGGGTCCCGGGATACGCTCCTCAGGGTGGCGGGTACGCCGA 

25 ACCCGCCGGCCGAGACTACGACTACGGCCAATCAGGCGCTCCGGACTACGGTCAGCCAGCGC 
CCGGTGGCTACAGCGGTTACGGGCAGGGCGGCTATGGGTCCGCCGGAACGTCGGTTACGCTG 
CAGCTCGACGACGGCAGCGGACGCACTTACCAGCTCCGCGAGGGCTCCAACATCATCGGTCGC 
GGACAGGACGCCCAGTTCCGGCTGCCCGACACCGGTGTGTCACGCCGTCACTTGGAGATCCG 
GTGGGACGGGCAGGTCGCATTGCTCGCAGACCTGAACTCCACCAACGGCACCACTGTTAACAA 

30 TGCACCGGTACAGGAGTGGCAGTTGGCCGACGGTGATGTGATCCGCTTGGGACACTCCGAGAT 
CATCGTCCGCATGCACTGA 

>Rv0032 bioF2 C-terminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
>emb|AL123456|MTBH37RV:34295-36610, bioF2 SEQ ID NO:11 
35 ATGCCCACTGGCTTGGGCTATGACTTTCTGCGCCCTGTCGAGGACTCGGGGATCAACGACCTGA 
AGCACTATTACTTCATGGCGGATTTGGCCGATGGGCAACCGCTAGGCCGGGCAAACCTCTATAG 
CGTCTGTTTCGACCTGGCCACCACCGACCGCAAGCTCACTCCGGCCTGGCGAACGACCATCAA 
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ACGGTGGTTTCCGGGGTTTATGACCTTCCGTTTCCTCGAGTGCGGGTTGCTCACCATGGTGAGC 
AACCCGCTGGCGTTGCGGTCCGACACCGACTTGGAGCGGGTATTGCCTGTGCTGGCCGGCCAG 
ATGGACCAGTTGGCGCATGACGACGGGTCGGATTTCTTGATGATCCGGGACGTGGACCCGGAA 
CACTACCAGCGATACCTTGACATCCTGCGCCCGTTGGGCTTTCGGCCTGCGCTGGGCTTTTCCC 
5 GGGTAGACACGACCATCAGCTGGTCGAGCGTGGAAGAGGCACTGGGCTGCCTGTCTCACAAAA 
GGCGCCTGCCGTTGAAGACGTCGCTGGAGTTTCGTGAGCGGTTCGGTATCGAGGTCGAGGAAC 
TCGACGAGTATGCCGAGCATGCGCCGGTATTGGCCCGGCTTTGGCGCAACGTCAAGACGGAGG 
CAAAGGATTACCAGCGCGAGGACCTGAACCCTGAGTTCTTCGCGGCGTGTTCTCGGCATCTGCA 
TGGACGTAGCAGACTGTGGTTGTTCCGCTACCAGGGCACGCCAATTGCCTTC 1 I I I I GAACGTTT 

10 GGGGTGCGGATGAGAACTACATACTGCTTGAGTGGGGCATCGATCGTGATTTTGAACATTATAG 
GAAGGCGAATCTGTACCGGGCGGCGCTGATGCTCAGCCTAAAAGATGCGATCAGCCGAGATAA 
ACGGCGAATGGAAATGGGTATTACGAACTATTTCACAAAACTTCGCATTCCGGGTGCCCGAGTC 
ATACCGACCATCTATTTCCTGCGTCACAGCACGGATCCGGTGCATACGGCAACGTTAGCGCGAA 
TGATGATGCACAATATTCAACGGCCAACGCTACCCGACGATATGTCGGAGGAATTCTGTCGCTG 

15 GGAAGAGCGAATACGTCTGGACCAGGACGGGCTACCCGAACACGATATCTTTCGCAAGATCGAT 
CGTCAGCACAAATACACGGGGCTCAAACTCGGCGGAGTCTACGGTTTTTATCCCCGATTCACCG 
GACCGCAGCGATCCACGGTCAAGGCCGCGGAGCTGGGCGAGATCGTGTTGCTGGGCACGAAC 
TCGTATCTGGGCCTGGCCACCCATCCAGAGGTGGTGGAGGCCTCGGCGGAGGCCACGCGACG 
GTACGGCACCGGCTGCTCGGGTTCGCCGTTGCTGAACGGCACGTTGGACTTGCACGTCTCGCT 

20 TGAGCAGGAACTAGCCTGTTTTTTGGGCAAACCCGCCGCCGTGTTGTGCTCCACCGGATATCAG 
AGCAACCTGGCGGCGATCAGCGCGCTATGCGAATCCGGGGACATGATCATCCAAGACGCGCTG 
AACCACCGCAGCCTGTTCGACGCCGCCAGGTTGTCCGGGGCCGACTTCACCTTGTACCGGCAC 
AACGACATGGACCACCTGGCGCGGGTGCTACGCCGCACCGAGGGGCGCCGCCGGATCATCGT 
CGTGGACGCGGTGTTCAGCATGGAAGGCACCGTCGCCGACCTGGCCACCATCGCCGAGCTTG 

25 CCGACCGGCACGGCTGCCGGGTCTATGTGGACGAGTCCCATGCGCTGGGCGTGCTCGGCCCC 
GACGGGCGAGGAGCTTCGGCCGCGTTGGGTGTCTTGGCGCGCATGGACGTGGTGATGGGCAC 
GTTCAGCAAATCCTTTGCCTCCGTCGGCGGGTTCATCGCCGGAGATCGGCCCGTCGTGGACTA 
CATCCGGCACAACGGTTCAGGTCATGTGTTTTCCGCCAGCCTGCCGCCGGCCGCCGCGGCTGC 
CACCCACGCGGCTCTGCGCGTCAGTCGGCGTGAACCCGACCGGCGGGCTCGGGTGCTGGCCG 

30 CGGCCGAGTACATGGCCACCGGCCTGGCACGGCAGGGCTATCAGGCCGAGTATCACGGAACC 
GCGATCGTGCCGGTGATCCTGGGCAACCCGACCGTGGCGCATGCGGGCTATCTGCGGCTGAT 
GCGCTCCGGGGTGTATGTGAACCCGGTGGCCCCCCCAGCCGTGCCGGAGGAGCGTTCGGGAT 
TCCGCACCAGCTACCTAGCCGACCACCGACAATCTGACCTCGACCGGGCCTTGCACGTGTTTGC 
CGGCCTTGCCGAGGACCTGACCCCGCAAGGAGCCGCGCTATGA 

>Rv0050 ponA1 TB.seq 53661:55694 MW:71119 
>emb|AL123456|MTBH37RV:53661-55697, ponA SEQ ID NO:12 
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GTGGTGATCCTGTTGCCGATGGTCACCTTCACGATGGCCTACCTGATCGTCGACGTTCCCAAGC 
CAGGTGACATCCGTACCAACCAGGTCTCCACGATCCTTGCCAGCGACGGCTCGGAAATCGCCA 
AAATTGTTCCGCCCGAAGGTAATCGGGTCGACGTCAACCTCAGCCAGGTGCCGATGCATGTGC 
GCCAGGCGGTGATTGCGGCCGMGACCGCAATTTCTATTCGAATCCGGGATTCTCGTTCACCGG 
5 CTTCGCGCGGGCAGTCAAGAACAACCTGTTCGGCGGCGATCTGCAGGGCGGATCGACGATTAC 
CCAGCAGTACGTCAAGAACGCGCTGGTCGGTTCCGCACAGCACGGGTGGAGCGGTCTGATGC 
GCAAGGCGAAAGAATTGGTCATCGCGACGAAGATGTCGGGGGAGTGGTCTAAAGACGATGTGC 
TGCAGGCGTATCTGAACATCATCTACTTCGGCCGGGGCGCCTACGGCATTTCGGCGGCGTCCA 
AGGCTTATTTCGACAAGCCCGTCGAGCAGCTGACCGTTGCCGAAGGGGCGTTGTTGGCAGCGC 

10 TGATTCGGCGGCCTTCGACGCTGGACCCGGCGGTCGACCCCGAAGGGGCCCATGCCCGCTGG 
AATTGGGTACTCGACGGCATGGTGGAAACCAAGGCTCTCTCGCCGAATGACCGTGCGGCGCAG 
GTGTTTCCCGAGACAGTGCCGCCCGATCTGGCCCGGGCAGAGAATCAGACCAAAGGACCCAAC 
GGGCTGATCGAGCGGCAGGTGACAAGGGAGTTGCTCGAGCTGTTCAACATCGACGAGCAGACC 
CTCAACACCCAGGGGCTGGTGGTCACCACCACGATTGATCCGCAGGCCCAACGGGCGGCGGA 

15 GAAGGCGGTTGCGAAATACCTGGACGGGCAGGACCCCGACATGCGTGCCGCCGTGGTTTCCAT 
CGACCCGCACAACGGGGCGGTGCGTGCGTACTACGGTGGCGACAATGCCAATGGCTTTGACTT 
CGCTCAAGCGGGATTGCAGACTGGATCGTCGTTTAAGGTGTTTGCTCTGGTGGCCGCCCTTGAG 
CAGGGGATCGGCCTGGGCTACCAGGTAGACAGCTCTCCGTTGACGGTCGACGGCATCAAGATC 
ACCAACGTCGAGGGCGAGGGTTGCGGGACGTGCAACATCGCCGAGGCGCTCAAAATGTCGCT 

20 GAACACCTCCTACTACCGGCTGATGCTCAAGCTCAACGGCGGCCCACAGGCTGTGGCCGATGC 
CGCGCACCAAGCCGGCATTGCCTCCAGCTTCCCGGGCGTTGCGCACACGCTGTCCGAAGATGG 
CAAGGGTGGACCGCCCAACAACGGGATCGTGTTGGGCCAGTACCAAACCCGGGTGATCGACAT 
GGCATCGGCGTATGCCACGTTGGCCGCGTCCGGTATCTACCACCCGCCGCATTTCGTACAGAA 
GGTGGTCAGTGCCAACGGCCAGGTCCTCTTCGACGCCAGCACCGCGGACAACACCGGCGATCA 

25 GCGCATCCCCAAGGCGGTAGCCGACAACGTGACTGCGGCGATGGAGCCGATCGCAGGTTATTC 
GCGTGGCCACAACCTAGCGGGTGGGCGGGATTCGGCGGCCAAGACCGGCACTACGCAATTTG 
GTGACACCACCGCGAACAAAGACGCCTGGATGGTCGGGTACACGCCGTCGTTGTCTACGGCTG 
TGTGGGTGGGCACCGTCAAGGGTGACGAGCCACTGGTAACCGCTTCGGGTGCAGCGATTTACG 
GCTCGGGCCTGCCGTCGGACATCTGGAAGGCAACCATGGACGGCGCCTTGAAGGGCACGTCX3 

30 AACGAGACTTTCCCCAAACCGACCGAGGTCGGTGGTTATGCCGGTGTGCCGCCGCCGCCGCCG 
CCGCCGGAGGTACCACCTTCGGAGACCGTCATCCAGCCCACGGTCGAAATTGCGCCGGGGATT 
ACCATCCCGATCGGTCCCCCGACCACCATTACCCTGGCGCCACCGCCCCCGGCCCCGCCCGCT 
GCGACTCCCACGCCGCCGCCGTGA 

35 >Rv0051 -TB.seq 55694:57373 MW:61210 

>emb|AL123456|MTBH37RV:55694-57376, Rv0051 SEQ ID NO:13 
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GTGACCGGCGCGCTGTCCCAAAGCAGCAACATCTCGCCACTTCCTTTGGCCGCCGATCTGCGG 
AGCGCCGATAACCGCGATTGCCCCAGCCGCACCGACGTATTGGGTGCCGCTCTGGCGAATGTC 
GTCGGTGGCCCGGTAGGCCGGCACGCGCTGATCGGCCGCACCCGGCTGATGACCCCGCTGCG 
GGTGATGTTTGCAATCGCGTTGGTGTTCCTGGCGCTCGGTTGGTCGACGAAAGCGGCCTGCTT 
5 GCAGTCCACCGGAACCGGTCCAGGTGATCAGCGGGTGGCCAACTGGGATAACCAGCGTGCTTA 
CTACCAGTTGTGCTACTCCGATACGGTGCCGCTCTATGGCGCTGAGTTATTGAGCCAAGGCAAG 
TTTCCGTACAAATCAAGCTGGATCGAAACCGACAGCAACGGCACACCGCAGCTGCGCTACGAC 
GGACAGATCGCGGTGCGCTATATGGAGTATCCGGTGCTGACTGGGATCTATCAGTACCTGTCGA 
TGGCGATAGCCAAGACCTACACCGCGTTAAGCAAGGTGGCTCCCCTCCCGGTGGTTGCCGAAG 

10 TGGTGATGTTCTTCAACGTCGCCGCGTTCGGTTTGGCGCTGGCGTGGCTGACAACCGTCTGGG 
CGACCTCGGGCCTGGCCGGCCGCCGGATATGGGATGCGGCGCTGGTGGCCGCCTCACCGCTG 
GTGATCTTTCAGATATTCACCAATTTCGATGCGCTGGCAACGGGTTTGGCGACGAGTGGGCTGC 
TGGCCTGGGCGCGGCGCAGACCGGTGCTTGCCGGTGTGCTGATCGGGTTGGGCTCCGCGGCG 
AAACTGTATCCGCTGTTGTTCTTGTACCCGTTGTTGCTGCTGGGCATCCGGGCCGGTCGCCTGA 

15 ATGCTCTGGCCCGCACCATGGCGGCCGCGGCGGCGACCTGGTTGTTGGTGAATCTGCCGGTGA 
TGCTGCTCTTTCCGCGCGGCTGGTCGGAGTTCTTCCGGCTCAACACCCGGCGCGGCGACGACA 
TGGACTCGTTGTACAACGTCGTCAAGTCGTTCACCGGCTGGCGTGGCTTCGACCCCACCCTGG 
GCTTCTGGGAGCCGCCGCTGGTGCTGAACACGGTTGTCACGCTCTTGTTCGTGTTATGTTGTGC 
GGCAATTGCTTACATCGCGCTCACCGCACCCCACCGGCCGCGCGTGGCGCAGCTGACTTTCTT 

20 GACGGTGGCCAGCTTCCTGTTGGTCAACAAGGTGTGGAGTCCCCAGTTCTCGCTTTGGCTGGTG 
CCGCTGGCCGTGCTGGCTTTGCCGCACCGCCGGATCTTGCTGGCGTGGATGACGATCGACGCG 
TTGGTGTGGGTGCCGCGGATGTACTACCTATACGGCAACCCGAGCCGCTCGCTGCCCGAGCAG 
TGGTTCACCACGACGGTGTTGCTGCGTGACATCGCCGTGATGGTGCTGTGCGGACTGGTGGTC 
TGGCAGATCTACCGCCCCGGGCGCGACCTCGTGCGTACCGGCGGGCCAGGGGCACTGCCGGC 

25 TTGTGGGGGAGTCGACGACCCGGTGGGAGGGGTCTTTGCCAACGCCGCCGACGCCCCGCCAG 
GTCGGCTACCGTCGTGGCTGCGTCCCCGGCTGGGCGACGAGCATGCGCGAGAGAGGACGCCC 
GATGCAGGTCGCGATCGCACTTTTTCCGGGCAACACCGCGCTTGA 

>Rv0106 - TB.seq 124372:125565 MW:43701 

30 >emb|AL123456|MTBH37RV:124372-125568, Rv0106 SEQ ID NO: 14 

ATGCGTACTCCGGTGATATTGGTGGCAGGTCAGGATCACACCGACGAGGTGACGGGCGCCTTG 
TTGCGCCGGACCGGAACGGTGGTCGTGGAGCACCGGTTTGACGGCCATGTGGTGCGACGGAT 
GACTGCCACGCTGAGCCGTGGCGAATTGATCACGACGGAGGACGCTTTGGAGTTCGCCCACGG 
CTGTGTGTCGTGCACAATCCGCGACGACCTGCTGGTGCTGTTACGCAGACTGCACCGCCGAGA 

35 CAATGTCGGCCGGATCGTCGTGCACCTGGCGCCGTGGCTGGAGCCCCAGCCCATCTGCTGGG 
CGATCGACCACGTGCGGGTTTGCGTCGGACACGGATACCCAGACGGACCAGCCGCCCTCGAC 
GTGCGGGTCGCGGCCGTGGTGACCTGTGTGGACTGCGTAAGGTGGCTGCCGCAGTCACTCGG 
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CGAGGACGAACTGCCCGACGGGCGCACGGTGGCCCAAGTGACGGTCGGTCAGGCCGAGTTCG 
CCGACCTTCTGGTGCTGACCCACCCGGAACCGGTGGCCGTGGCGGTTCTGCGCCGACTGGCC 
CCTCGAGCGCGAATCACCGGCGGCGTCGACCGCGTCGAGCTGGCGCTGGCGCATCTGGACGA 
CAACTCACGGAGGGGTCGTACCGATACCCCGCACACGCCATTGCTGGCGGGCCTGCCTCCGTT 
5 GGCAGCCGACGGTGAGGTTGCGATCGTGGAATTCAGTGCCCGCCGCCCGTTTCACCCGCAACG 
TCTGCATGCCGCGGTTGACCTGCTGCTCGATGGCGTGGTTCGCACTCGAGGTCGGCTGTGGCT 
GGCCAACCGGCCGGATCAGGTCATGTGGCTCGAATCAGCCGGTGGCGGTCTGCGGGTCGCAT 
CGGCCGGAAAGTGGTTGGCGGCGATGGCGGCCTCGGAGGTGGCCTATGTCGACCTGGAGCGG 
CGGTTGTTCGCCGACCTGATGTGGGTCTACCCGTTCGGAGACCGGCACACCGCGATGACGGTA 
10 CTGGTATGCGGCGCCGATCCGACCGACATCGTCAATGCCCTGAACGCGGCGCTGCTCAGCGAC 
GACGAAATGGCATCTCCGCAACGCTGGCAGTCCTACGTCGACCCTTTCGGCGACTGGCATGAC 
GACCCGTGCCACGAAATGCCCGATGCGGCTGGGGAATTCTCGGCACACCGCAACTCAGGAGAA 
TCTCGATGA 

15 >Rv0125 - TB.seq 151146:152210 MW:34927 

>emb|AL123456|MTBH37RV:151146-152213, pepA SEQ ID NO:15 

ATGAGCAATTCGCGCCGCCGCTCACTCAGGTGGTCATGGTTGCTGAGCGTGCTGGCTGCCGTC 
GGGCTGGGCCTGGCCACGGCGCCGGCCCAGGCGGCCCCGCCGGCCTTGTCGCAGGACCGGT 
TCGCCGACTTCCCCGCGCTGCCCCTCGACCCGTCCGCGATGGTCGCCCAAGTGGGGCCACAG 

20 GTGGTCAACATCAACACCAAACTGGGCTACAACAACGCCGTGGGCGCCGGGACCGGCATCGTC 
ATCGATCCCAACGGTGTCGTGCTGACCAACAACCACGTGATCGCGGGCGCCACCGACATCAAT 
GCGTTCAGCGTCGGCTCCGGCCAAACCTACGGCGTCGATGTGGTCGGGTATGACCGCACCCAG 
GATGTCGCGGTGCTGCAGCTGCGCGGTGCCGGTGGCCTGCCGTCGGCGGCGATCGGTGGCG 
GCGTCGCGGTTGGTGAGCCCGTCGTCGCGATGGGCAACAGCGGTGGGCAGGGCGGAACGCC 

25 CCGTGCGGTGCCTGGCAGGGTGGTCGCGCTCGGCCAAACCGTGCAGGCGTCGGATTCGCTGA 
CCGGTGCCGAAGAGACATTGAACGGGTTGATCCAGTTCGATGCCGCGATCCAGCCCGGTGATT 
CGGGCGGGCCCGTCGTCAACGGCCTAGGACAGGTGGTCGGTATGAACACGGCCGCGTCCGAT 
AACTTCCAGCTGTCCCAGGGTGGGCAGGGATTCGCCATTCCGATCGGGCAGGCGATGGCGATC 
GCGGGCCAGATCCGATCGGGTGGGGGGTCACCCACCGTTCATATCGGGCCTACCGCCTTCCTC 

30 GGCTTGGGTGTTGTCGACAACAACGGCAACGGCGCACGAGTCCAACGCGTGGTCGGGAGCGC 
TCCGGCGGCAAGTCTCGGCATCTCCACCGGCGACGTGATCACCGCGGTCGACGGCGCTCCGAT 
CAACTCGGCCACCGCGATGGCGGACGCGCTTAACGGGCATCATCCCGGTGACGTCATCTCGGT 
GACCTGGCAAACCAAGTCGGGCGGCACGCGTACAGGGAACGTGACATTGGCCGAGGGACCCC 
CGGCCTGA 

35 

>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 419833:421707 
MW:66832 SEQ ID NO:16 
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>emb|AL123456|MTBH37RV:419833-421710, dnaK 

ATGGCTCGTGCGGTCGGGATCGACCTCGGGACCACCAACTCCGTCGTCTCGGTTCTGGAAGGT 

GGCGACCCGGTCGTCGTCGCCAACTCCGAGGGCTCCAGGACCACCCCGTCAATTGTCGCGTTC 

GCCCGCAACGGTGAGGTGCTGGTCGGCCAGCCCGCCAAGAACCAGGCAGTGACCAACGTCGA 

TCGCACCGTGCGCTCGGTCAAGCGACACATGGGCAGCGACTGGTCCATAGAGATTGACGGCAA 

GAAATACACCGCGCCGGAGATCAGCGCCCGCATTCTGATGAAGCTGAAGCGCGACGCCGAGGC 

CTACCTCGGTGAGGACATTACCGACGCGGTTATCACGACGCCCGCCTACTTCAATGACGCCCAG 

CGTCAGGCCACCAAGGACGCCGGCCAGATCGCCGGCCTCAACGTGCTGCGGATCGTCAACGA 

GCCGACCGCGGCCGCGCTGGCCTACGGCCTCGACAAGGGCGAGAAGGAGCAGCGAATCCTGG 

TCTTCGACTTGGGTGGTGGCACTTTCGACGTTTCCCTGCTGGAGATCGGCGAGGGTGTGGTTGA 

GGTCCGTGCCACTTCGGGTGACAACCACCTCGGCGGCGACGACTGGGACCAGCGGGTCGTCG 

ATTGGCTGGTGGACAAGTTCAAGGGCACCAGCGGCATCGATCTGACCAAGGACAAGATGGCGA 

TGCAGCGGCTGCGGGAAGCCGCCGAGAAGGCAAAGATCGAGCTGAGTTCGAGTCAGTCCACCT 

CGATCAACCTGCCCTACATCACCGTCGACGCCGACAAGAACCCGTTGTTCTTAGACGAGCAGCT 

GACCCGCGCGGAGTTCCAACGGATCACTCAGGACCTGCTGGACCGCACTCGCAAGCCGTTCCA 

GTCGGTGATCGCTGACACCGGCATTTCGGTGTCGGAGATCGATCACGTTGTGCTCGTGGGTGG 

TTCGACCCGGATGCCCGCGGTGACCGATCTGGTCAAGGAACTCACCGGCGGCAAGGAACCCAA 

CAAGGGCGTCAACCCCGATGAGGTTGTCGCGGTGGGAGCCGCTCTGCAGGCCGGCGTCCTCA 

AGGGCGAGGTGAAAGACGTTCTGCTGCTTGATGTTACCCCGCTGAGCCTGGGTATCGAGACCA 

AGGGCGGGGTGATGACCAGGCTCATCGAGCGCAACACCACGATCCCCACCAAGCGGTCGGAG 

ACTTTCACCACCGCCGACGACAACCAACCGTCGGTGCAGATCCAGGTCTATCAGGGGGAGCGT 

GAGATCGCCGCGCACAACAAGTTGCTCGGGTCCTTCGAGCTGACCGGCATCCCGCCGGCGCC 

GCGGGGGATTCCGCAGATCGAGGTCACTTTCGACATCGACGCCAACGGCATTGTGCACGTCAC 

CGCCAAGGACAAGGGCACCGGCAAGGAGAACACGATCCGAATCCAGGAAGGCTCGGGCCTGT 

CCAAGGAAGACATTGACCGCATGATCAAGGACGCCGAAGCGCACGCCGAGGAGGATCGCAAGC 

GTCGCGAGGAGGCCGATGTTCGTAATCAAGCCGAGACATTGGTCTACCAGACGGAGAAGTTCG 

TCAAAGAACAGCGTGAGGCCGAGGGTGGTTCGAAGGTACCTGAAGACACGCTGAACAAGGTTG 

ATGCCGCGGTGGCGGAAGCGAAGGCGGCACTTGGCGGATCGGATATTTCGGCCATCAAGTCG 

GCGATGGAGAAGCTGGGCCAGGAGTCGCAGGCTCTGGGGCAAGCGATCTACGAAGCAGCTCA 

GGCTGCGTCACAGGCCACTGGCGCTGCCCACCCCGGCGGCGAGCCGGGCGGTGCCCACCCC 

GGCTCGGCTGATGACGTTGTGGACGCGGAGGTGGTCGACGACGGCCGGGAGGCCAAGTGA 

>Rv0351 grpE stimulates DnaK ATPase activity TB.seq 421707:42241 1 MW:24501 
>emb|AL123456|MTBH37RV:421707-422414, grpE SEQ ID NO:17 

GTGACGGACGGAAATCAAAAGCCGGATGGCAATTCGGGCGAACAGGTAACCGTCACTGACAAG 

CGGCGGATCGATCCCGAGACGGGTGAAGTGCGGCACGTCCCTCCCGGCGACATGCCGGGAGG 

GACGGCTGCGGCCGATGCGGCGCACACCGAAGACAAGGTCGCCGAGCTGACCGCCGATCTGC 

58 
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AACGCGTGCAGGCCGACTTCGCCAACTACCGTAAGCGGGCGTTGCGCGATCAGCAGGCGGCC 
GCTGACCGAGCCAAGGCCAGCGTTGTCAGCCAATTGCTGGGTGTACTGGACGATCTCGAGCGG 
GCGCGCAAGCACGGCGATTTGGAGTCGGGTCCACTGAAGTCGGTCGCCGACAAGCTAGACAGC 
GCGTTGACCGGGCTGGGTCTGGTGGCGTTCGGTGCCGAGGGCGAGGATTTCGACCCCGTGCT 
5 GCACGAAGCGGTGCAACACGAGGGCGACGGCGGGCAGGGGTCCAAGCCGGTAATCGGCACC 
GTCATGCGGCAGGGCTACCAACTGGGTGAGCAGGTGCTGCGGCACGCCTTGGTCGGCGTCGT 
CGACACGGTGGTCGTCGACGCGGCCGAACTGGAGTCAGTCGACGACGGCACTGCGGTCGCAG 
ATACCGCCGAAAACGATCAAGCTGACCAGGGCAATAGCGCCGACACCTCGGGCGAACAGGCAG 
AATCAGAACCGTCGGGCAGTTAA 

10 

>Rv0352 dnaJ acts with GrpE to stimulate DnaK ATPase TB.seq 422450:423634 MW:41 346 
>emb|AL123456|MTBH37RV:422450-423637, dnaJ SEQ ID NO: 18 

ATGGCCCAAAGGGAATGGGTCGAAAAAGACTTCTACCAGGAGCTGGGCGTCTCCTCTGATGCC 
AGTCCTGAAGAGATCAAACGTGCCTATCGGAAGTTGGCGCGCGACCTGCATCCGGACGCGAAC 

15 CCGGGCAACCCGGCCGCCGGCGAACGGTTCAAGGCGGTTTCGGAGGCGCATAACGTGCTGTC 
GGATCCGGCCAAGCGCAAGGAGTACGACGAAACCCGCCGCCTGTTCGCCGGCGGCGGGTTCG 
GCGGCCGTCGGTTCGACAGCGGCTTTGGGGGCGGGTTCGGCGGTTTCGGGGTCGGTGGAGAC 
GGCGCCGAGTTCAACCTCAACGACTTGTTCGACGCCGCCAGCCGAACCGGCGGTACCACCATC 
GGTGACTTGTTCGGTGGCTTGTTCGGACGCGGTGGCAGCGCCCGTCCCAGCCGCCCGCGACG 

20 CGGCAACGACCTGGAGACCGAGACCGAGTTGGATTTCGTGGAGGCCGCCAAGGGCGTGGCGA 
TGCCGCTGCGATTAACCAGCCCGGCGCCGTGCACCAACTGCCATGGCAGCGGGGCCCGGCCA 
GGCACCAGCCCAAAGGTGTGTCCCACTTGCAACGGGTCGGGCGTGATCAACCGCAATCAGGGC 
GCGTTCGGCTTCTCCGAGCCGTGCACCGACTGCCGAGGTAGCGGCTCGATCATCGAGCACCCC 
TGCGAGGAGTGCAAAGGCACCGGCGTGACCACCCGCACCCGAACCATCAACGTGCGGATCCC 

25 GCCCGGTGTCGAGGATGGGCAGCGCATCCGGCTAGCCGGTCAGGGCGAGGCCGGGTTGCGC 
GGCGCTCCCTCGGGGGATCTCTACGTGACGGTGCATGTGCGGCCCGACAAGATCTTCGGCCGC 
GACGGCGACGACCTCACCGTCACCGTTCCGGTCAGCTTCACCGAATTGGCTTTGGGCTCGACG 
CTGTCGGTGCCTACCCTGGACGGCACGGTCGGGGTCCGGGTGCCCAAAGGCACCGCTGACGG 
CCGCATTCTGCGTGTGCGCGGACGCGGTGTGCCCAAGCGCAGTGGGGGTAGCGGCGACCTAC 

30 TTGTCACCGTGAAGGTGGCCGTGCCGCCCAATTTGGCAGGCGCCGCTCAGGAAGCTCTGGAAG 
CCTATGCGGCGGCGGAGCGGTCCAGTGGTTTCAACCCGCGGGCCGGATGGGCAGGTAATCGC 
TGA 

>Rv0363c fba fructose bisphosphate aldolase TB.seq 441266:442297 MW:36545 
35 >emb|AL123456|MTBH37RV:c442297-441263 t fba SEQIDNO:19 

ATGCCTATCGCAACGCCCGAGGTCTACGCGGAGATGCTCGGTCAGGCCAAACAAAACTCGTAC 
GCTTTCCCGGCTATCAACTGCACCTCCTCGGAAACCGTCAACGCCGCGATCAAAGGTTTCGCCG 
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ACGCCGGCAGTGACGGAATCATCCAGTTCTCGACCGGTGGCGCAGAATTCGGCTCCGGCCTCG 

GGGTCAAAGACATGGTGACCGGTGCGGTCGCCTTGGCGGAGTTCACCCACGTTATCGCGGCCA 

AGTACCCGGTCAACGTGGCGCTGCACACCGACCACTGCCCCAAGGACAAGTTGGACAGCTATG 

TCCGGCCCTTGCTGGCGATCTCGGCGCAACGCGTGAGCAAAGGTGGCAATCCTTTGTTCCAGT 

CGCACATGTGGGACGGCTCGGCAGTGCCAATCGATGAGAACCTGGCCATCGCCCAGGAGCTGC 

TCAAGGCGGCGGCGGCCGCCAAGATCATTCTGGAGATCGAGATCGGCGTCGTCGGCGGCGAA 

GAGGACGGCGTGGCGAACGAGATCAACGAGAAGCTGTACACCAGCCCGGAGGACTTCGAGAAA 

ACCATCGAGGCGCTGGGCGCCGGTGAGCACGGCAAATACCTGCTGGCCGCGACGTTCGGCAA 

CGTGCATGGCGTCTACAAGCCCGGCAACGTCAAGCTTCGCCCCGACATCCTTGCGCAAGGGCA 

ACAGGTGGCGGCGGCCAAGCTCGGACTGCCGGCCGACGCCAAGCCGTTCGACTTCGTGTTCC 

ACGGCGGCTCGGGTTCGCTTAAGTCGGAGATCGAGGAGGCGCTGCGCTACGGCGTGGTGAAG 

ATGAACGTCGACACCGACACCCAGTACGCGTTCACCCGCCCGATCGCCGGTCACATGTTCACC 

AACTACGACGGAGTGCTCAAGGTCGATGGCGAGGTGGGTGTCAAGAAGGTCTACGACCGGCGC 

AGCTACCTCAAGAAGGCCGAAGCTTCGATGAGCCAGCGGGTCGTTCAGGCGTGCAATGACCTG 

CACTGCGCCGGAAAGTCCCTAACCCACTAA 

>RvO405 pks6 TB.seq 485729:489934 MW:147615 >emb|AL123456|MTBH37RV:485729-489937, 
pks6 SEQIDNO:20 

ATGACAGACGGTTCGGTCACTGCGGATAAGCTTCAAAAATGGTTTCGAGAGTACTTGTCCACGC 
ATATCGAGTGTCATCCAAATGAGGTCAGCCTAGACGTTCCGATTAGAGATTTAGGTTTGAAATCG 
ATTGATGTCTTAGCGATTCCCGGCGACCTCGGTGACAGATTTGGGTTTTGTATTCCCGATTTGGC 
CGTTTGGGATAATCCTAGCGCTAATGATTTGATTGATAGTCTGTTGAACCAGCGTAGTGCTGACT 
CGTTAAGAGAGAGTCATGGACACGCCGACAGGAACACGCAGGGTCGGGGCAGCATAAACGAGC 
CGGTTGCGGTCATCGGAGTGGGCTGTCGATTTCCGGGAGATATTGACGGCCCGGAACGGCTAT 
GGGACTTTCTGACCGAGAAGAAGTGTGCGATAACAGCGTATCCAGATCGTGGGTTCACGAATGC 
TGGAACTTTCGCGGAGTCCGGAGGCTTTTTAAAGGATGTCGCGGGTTTCGATAATAG A I I I I I I G 
ATATCCCGCCGGACGAGGCTCTGCGAATGGATCCGCAACAACGGTTGTTACTGGAGGTCTCTTG 
GGAAGCGTTAGAGCATGCAGGAATTATTCCTGAGTCATTAAGACTTTCACGTACGGGCGTATTC 
GTTGGGGTGTCGTCAACTGACTACGTCCGGCTTGTGTCAGCTAGCGCTCAGCAAAAGTCTACTA 
TTTGGGATAACACCGGCGGTTCTTCGAGTATTATTGCCAATAGAATCTCATACTTTCTCGATATTC 
AGGGTCCGTCCATTGTCATTGACACGGCATGCTCGTCATCCCTGGTCGCCGTGCATCTAGCCTG 
TCGAAGTCTCAGTACCTGGGACTGCGATATCGCACTTGTCGGTGGGACGAATGTTCTTATTTCAC 
CAGAACCATGGGGTGGGTTTAGGGAAGCGGGCATCTTGTCGCAGACAGGCTGCTGTCACGCGT 
TCGATAAATCCGCCGACGGGATGGTACGCGGTGAGGGATGCGGAGTTATCGTGCTGCAGCGCC 
TCAGTGATGCACGCCTTGAGGGCCGGCGGATATTAGCGATTCTGACGGGTTCAGCGGTCAATC 
AGGACGGTAAGTCCAACGGTATTATGGCGCCAAATCCTAGTGCGCAAATTGGTGTTCTTGAAAAT 
GCATGCAAGAGCGCTCGCGTCGATCCGCTGGAAATCGGCTACGTCGAGGCCCACGGGACCGG 

60 
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AACGTCGTTAGGGGATAGGATCGAGGCGCACGCCTTAGGCATGGTCTTTGGTCGCAAGAGACC 
GGGATCTGGGCCCCTGATGATCGGGAGCATCAAGCCGAATATCGGCCATCTGGAAGGTGCGGC 
TGGCATCGCCGGATTGATCAAGGCGGTGTTGATGGTTGAGCGTGGCTCGCTGCTTCCGAGCGG 
GGGGTTTACGGAGCCAAATCCAGCTATCCCATTCACGGAATTGGGCCTGAGAGTTGTAGACGAA 
5 CTTCAGGAGTGGCCGGTGGTGGCGGGTCGGCCGCGCCGGGCTGGGGTGTCATCGTTCGGCTT 
TGGCGGCACCAATGCGCATGTGATTGTCGAGGAAGCTGGTTCGGTTGGGGCGGACACGGTTTC 
GGGCCGCGCGGATGTTGGCGGTTCCGGTGGTGGGGTGGTGGCGTGGGTGATTTCGGGGAAGA 
CGGCTTCGGCGTTGGCTGCTCAGGCGGGTCGGTTGGGGCGGTATGTGCGGGCTCGGCCGGCG 
CTTGATGTTGTTGATGTGGGGTATTCGTTGGTGAGCACGCGGTCGGTGTTTGATCATCGGGCGG 

10 TGGTGGTCGGCCAGACTCGCGATGAGTTGCTGGCTGGGTTGGCTGGGGTGGTTGCTGGTCGG 
CCGGAGGCTGGGGTGGTCTGCGGTGTTGGCAAGCCGGCGGGCAAGACGGCTTTTGTGTTTGC 
CGGTCAGGGCTCGCAGTGGCTGGGTATGGGTAGCGAGCTTTATGCTGCCTACCCGGTTTTCGC 
CGAGGCCCTCGATGCTGTGGTGGACGAGTTGGACCGGCACCTGCGGTATCCGCTGCGCGATGT 
GATCTGGGGGCACGACCAAGATCTGTTGAATACCACCGAATTCGCCCAGCCGGCGCTGTTTGC 

15 GGTGGAGGTGGCGCTGTATCGGCTGCTCATGTCGTGGGGGGTGCGGCCGGGTTTGGTGCTGG 
GTCATTCGGTGGGCGAGTTGGCCGCGGCGCACGTCGCCGGGGCGCTGTGTTTGCCGGATGCG 
GCGATGCTGGTGGCCGCGCGTGGACGGTTGATGCAGGCGTTGCCCGCCGGCGGCGCCATGTT 
TGCGGTGCAGGCCCGTGAAGACGAGGTAGCGCCGATGCTGGGGCACGATGTGAGCATCGCGG 
CGGTCAATGGTCCGGCTTCGGTGGTGATCTCTGGTGCCCACGATGCGGTGAGCGCGATCGCTG 

20 ATCGGCTGCGCGGCCAGGGCCGTCGGGTCCACCGGTTGGCGGTCTCGCATGCCTTTCACTCG 
GCGTTGATGGAGCCGATGATCGCTGAGTTCACAGCCGTTGCGGCCGAACTGTCTGTGGGCTTG 
CCCACGATCCCGGTCATTTCCAATGTGACCGGGCAGTTGGTGGCCGACGACTTCGCCTCAGCT 
GATTACTGGGCCCGGCATATCCGGGCGGTGGTGCGGTTTGGCGACAGTGTTCGTAGTGCCCAC 
TGCGCCGGTGCCAGTCGTTTCATCGAAGTCGGGCCCGGTGGCGGCTTGACGTCGTTGATCGAG 

25 GCATCGCTGGCCGACGCGCAGATCGTGTCGGTGCCCACGCTGCGCAAAGATCGGCCCGAACC 
GGTCAGTGTGATGACGGCGGCGGCCCAGGGCTTCGTCTCGGGGATGGGCCTGGATTGGGCCT 
CGGTGTTTTCCGGGTACCGGCCCAAGCGGGTGGAGTTGCCGACGTATGCCTTCCAGCATCAAA 
AGTTCTGGCTCGCACCAGCCCCATCGGTCAGCGACCCCACCGCCGCCGGCCAGATCGGGGCT 
AGCGATGGTGGTGCTGAACTCTTGGCGTCCTCCGGGTTTGCCGCCCGGCTGGCCGGTCGGTCG 

30 GCCGACGAGCAACTCGCCGCAGCGATCGAGGTGGTATGTGAGCATGCCGCAGCGGTGCTGGG 
GCGCGACGGCGCTGCCGGACTCGACGCTGGCCAGGCGTTTGCCGATTCGGGATTTAATTCCTT 
GAGTGCCGTGGAGCTACGTAACCGCTTAACAGCCGTCACCGCAGTAACGCTGCCGGCCACCGC 
GATCTTCGATCACCCCACCCCGACCGAACTAGCCCAGTATCTGATCACCCAAATAGACGGTCAC 
GGCAGCTCCGCCGCCGCAGCGGCAAACCCGGCGGAGCGAATCGATGCGCTCACCGATCTTTTT 

35 CTACAAGCTTGCGATGCGGGTCGGGATGCCGATGGTTGGAAGATGGTCGCCCTGGCGTCGAAT 
ACGCGCGAGCGCATGAGCTCACCGGTTCGGAACAACGTATCGAAGAACGTCGCACTGCTGGCA 
GATGGTATCTCCGATGTGGTTGTAATTTGTATCCCAACTCTAACTGTGCTATCGGATCAGCGTGA 
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ATATCGAGATATTGCGMTGCGATGACAGGCCGCCATTCGGTTTATTCGCTTACGCTTCCCGGG 
TTCGATTCGTCTGATGCACTGCCGCAAAACGCGGATATGATTGTTGAAACCGTATCTAACGCAAT 
TATTGATGTGGTAGGCGGCAGCTGCCGTTTTGTGCTGTCGGGCTATTCATCGGGTGGGGTGTTG 
GCCTATGCCCTCTGCTCCCATCTGTCGGTCAAGCACCAGCGGAATCCCCTCGGAGTCGCACTCA 
5 TCGATACATATCTGCCTAGTCAGATCGCCAATCCTTCAATGAATGAAGGGTTCAGCCCCAACGAT 
ACTGGGAAGGGCCTTTCCCGTGAAGTAATTCGAGTGGCCAGAATGTTGAATCGGTTAACTGCCA 
CCCGACTCACCGCGGCAGCCACCTATGCTGCAATCTTTCAGGCCTGGGAACCAGGTAGATCAAT 
GGCTCCGGTTCTTAACATCGTGGCGAAGGACCGAATAGCTACCGTCGAAAATTTACGCGAAGAA 
CGAATCAACCGGTGGCGAACTGCTGCTGCAGAGGCGGCCTATTCTGTAGCCGAAGTACCCGGG 
10 GATCATTTCGGAATGATGAGCACCTCGAGTGAGGCAATAGCTACCGAAATACATGATTGGATTTC 
TGGGCTCGTTCGAGGGCCTCATCGGTAG 

>RvQ435c- ATPase of AAA-family TB.seq 522348:524531 MW:75315 
>emb|AL123456|MTBH37RV:c524531-522345, Rv0435c SEQ ID NO:21 

15 GTGACCCACCCGGACCCGGCCCGCCAACTCACCCTTACCGCCCGGCTGAACACCTCGGCCGTC 
GACTCACGCCGCGGCGTCGTTCGGTTGCACCCCAATGCCATTGCTGCCCTTGGCATCCGCGAG 
TGGGACGCGGTGTCGCTGACCGGCTCTCGGACAACCGCCGCGGTCGCCGGCCTGGCCGCGGC 
AGACACCGCGGTCGGGACGGTGCTGCTCGATGACGTCACACTGTCCAATGCGGGCCTTCGCGA 
AGGCACCGAGGTGATCGTCAGCCCGGTCACCGTCTACGGAGCGCGATCGGTGACGCTGAGCG 

20 GTTCAACGCTGGCCACCCAGTCGGTGCCGCCGGTCACGCTGCGGCAGGCCCTACTCGGCAAG 
GTGATGACCGTCGGTGACGCGGTCTCGCTGCTGCCCCGCGATCTAGGCCCCGGCACATCCACG 
TCGGCTGCCAGCCGCGCATTGGCAGCTGCGGTCGGGATCAGTTGGACCTCGGAGCTGCTGACC 
GTTACCGGCGTCGACCCCGACGGGCCGGTCAGCGTGCAGCCCAACTCGCTGGTCACCTGGGG 
CGCTGGGGTCCCGGCCGCAATGGGTACGTCCACGGCCGGGCAAGTGAGCATCTCGAGTCCGG 

25 AGATCCAGATCGAAGAGCTCAAGGGCGCCCAGCCGCAGGCTGCCAAGCTCACCGAATGGCTCA 
AGCTTGCCCTCGATGAGCCGCACCTACTACAGACCTTGGGCGCCGGCACCAATTTGGGTGTGC 
TGGTGTCGGGTCCGGCCGGGGTGGGCAAGGCGACGCTGGTGCGCGCGGTGTGCGACGGCCG 
AAGGTTGGTGACACTGGATGGTCCGGAGATTGGAGCTCTGGCCGCCGGAGACCGGGTCAAAGC 
CGTGGCCTCGGCAGTGCAGGCGGTTCGCCATGAGGGCGGTGTGTTGCTGATCACCGATGCCGA 

30 CGCCCTGCTGCCAGCCGCCGCCGAGCCGGTAGCCTCGCTGATCCTGTCCGAGCTGCGTACCG 
CGGTGGCCACCGCCGGTGTGGTATTGATCGCCACCTCAGCACGGCCCGATCAACTCGATGCCC 
GGCTGCGTTCCCCCGAGTTGTGCGACCGGGAGCTTGGCCTGCCGCTGCCCGACGCGGCCACC 
CGCAAATCGCTGCTGGAGGCGCTGCTGAATCCGGTTCCTACCGGAGACCTCAACCTCGACGAA 
ATCGCCTCCCGCACACCGGGTTTCGTCGTGGCCGACCTGGCTGCGCTGGTTCGCGAGGCGGC 

35 GCTGCGGGCAGCGTCTCGAGCCAGTGCCGACGGCCGACCACCGATGCTGCACCAAGACGACC 
TCCTCGGTGCGTTGACCGTCATCCGGCCGCTGTCCCGCTCGGCCAGCGACGAAGTCACCGTGG 
GTGACGTGACGCTCGACGATGTCGGTGACATGGCCGCGGCCAAACAAGCACTGACCGAGGCG 
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GTGCTGTGGCCGCTGCAGCACCCCGACACCTTCGCTCGGCTAGGTGTCGAACCGCCGCGCGG 
GGTGTTGCTGTACGGCCCGCCCGGCTGCGGCAAGACCTTTGTGGTTCGTGCCCTGGCCAGCAC 
CGGACAGTTGAGCGTGCATGCCGTCAAAGGGTCGGAGCTGATGGACAAGTGGGTGGGCTCCTC 
GGAGAAGGCAGTCCGCGAGCTATTCCGGCGGGCCCGCGACTCCGCGCCGTCACTGGTGTTCC 
5 TCGACGAGCTGGACGCTCTGGCGCCACGGCGCGGTCAGAGCTTCGACTCGGGCGTCTCCGAC 
CGGGTGGTGGCCGCGCTGCTGACTGAGCTCGACGGTATTGACCCGCTGCGGGATGTCGTCATG 
CTAGGCGCGACCAACCGGCCCGATCTGATAGACCCGGCGCTGCTGCGCCCGGGGCGGCTAGA 
ACGGCTGGTGTTCGTTGAACCGCCCGACGCTGCCGCTCGCCGCGAAATCCTGCGCACCGCTGG 
CAAGTCGATCCCGCTGAGCTCCGACGTCGACCTGGACGAGGTGGCAGCCGGACTCGACGGTTA 
10 TAGTGCCGCCGACTGTGTGGCGCTGCTGCGCGAAGCCGCGCTTACCGCGATGCGGCGTTCCAT 
CGATGCCGCCAACGTCACCGCCGCCGACCTGGCGACCGCGCGAGAAACCGTGCGCGCGTCGC 
TGGATCCGCTGCAGGTGGCGTCGCTGCGTAAGTTCGGCACCAAGGGTGACCTTCGGTCCTAG 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase TB.seq 524531 :525388 
15 MW:31219 >emb|AL123456|MTBH37RV:c525388-524528, pssA SEQ ID NO:22 

ATGATCGGAAAGCCCCGCGGCAGGCGAGGGGTAAACCTGCAGATACTGCCCAGCGCGATGAC 
GGTGCTGTCCATTTGCGCGGGACTGACCGCAATCAAGTTTGCGCTCGAGCACCAGCCGAAGGC 
CGCGATGGCACTGATCGCCGCAGCGGCCATCCTCGACGGGCTCGACGGCCGGGTGGCCCGCA 
TCCTGGATGCCCAGTCGCGGATGGGCGCAGAGATCGACTCACTGGCCGACGCGGTGAACTTCG 
20 GAGTGACACCCGCGCTGGTGCTTTACGTGTCGATGTTGTCGAAGTGGCCGGTCGGTTGGGTGG 
TCGTGCTGCTCTACGCGGTGTGCGTGGTATTACGGCTGGCGCGGTACAACGCACTGCAGGACG 
ACGGAACCCAGCCCGCCTACGCGCATGAATTCTTCGTCGGAATGCCCGCGCCGGCGGGCGCG 
GTTTCCATGATCGGCCTGCTAGCCCTCAAAATGCAGTTCGGCGAAGGATGGTGGACCTCGGGCT 
GGTTCCTCAGCTTTTGGGTGACGGGAACGTCGATACTCTTGGTCAGCGGGATCCCGATGAAAAA 
25 GATGCACGCCGTGTCGGTACCACCCAACTACGCGGCCGCCCTGCTGGCGGTGCTGGCTATCTG 
CGCGGCGGCCGCAGTCCTGGCCCCCTACTTGTTGATCTGGGTGATCATCATCGCCTACATGTGC 
CATATTCCTTTCGCGGTGCGCAGCCAGCGCTGGCTTGCCCAACACCCTGAGGTGTGGGACGAC 
AAGCCCAAGCAACGGCGCGCGGTGCGGCGCGCGAGCCGCCGGGCGCATCCCTACCGGCCGT 
CGATGGCGCGGCTGGGCCTGCGCAAGCCGGGTCGACGGCTGTGA 

30 

>Rv0440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 
>emb|AL123456|MTBH37RV:528606-530228, groEL2 SEQ ID NO:23 

ATGGCCAAGACAATTGCGTACGACGAAGAGGCCCGTCGCGGCCTCGAGCGGGGCTTGAACGC 
CCTCGCCGATGCGGTAAAGGTGACATTGGGCCCCAAGGGCCGCAACGTCGTCCTGGAAAAGAA 
35 GTGGGGTGCCCCCACGATCACCAACGATGGTGTGTCCATCGCCAAGGAGATCGAGCTGGAGGA 
TCCGTACGAGAAGATCGGCGCCGAGCTGGTCAAAGAGGTAGCCAAGAAGACCGATGACGTCGC 
CGGTGACGGCACCACGACGGCCACCGTGCTGGCCCAGGCGTTGGTTCGCGAGGGCCTGCGCA 
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ACGTCGCGGCCGGCGCCAACCCGCTCGGTCTCAAACGCGGCATCGAAAAGGCCGTGGAGAAG 
GTCACCGAGACCCTGCTCAAGGGCGCCAAGGAGGTCGAGACCAAGGAGCAGATTGCGGCCAC 
CGCAGCGATTTCGGCGGGTGACCAGTCCATCGGTGACCTGATCGCCGAGGCGATGGACAAGGT 
GGGCAACGAGGGCGTCATCACCGTCGAGGAGTCCAACACCTTTGGGCTGCAGCTCGAGCTCAC 
5 CGAGGGTATGCGGTTCGACAAGGGCTACATCTCGGGGTACTTCGTGACCGACCCGGAGCGTCA 
GGAGGCGGTCCTGGAGGACCCCTACATCCTGCTGGTCAGCTCCAAGGTGTCCACTGTCAAGGA 
TCTGCTGCCGCTGCTCGAGAAGGTCATCGGAGCCGGTAAGCCGCTGCTGATCATCGCCGAGGA 
CGTCGAGGGCGAGGCGCTGTCCACCCTGGTCGTCAACAAGATCCGCGGCACCTTCAAGTCGGT 
GGCGGTCAAGGCTCCCGGCTTCGGCGACCGCCGCAAGGCGATGCTGCAGGATATGGCCATTCT 

10 CACCGGTGGTCAGGTGATCAGCGAAGAGGTCGGCCTGACGCTGGAGAACGCCGACCTGTCGC 
TGCTAGGCAAGGCCCGCAAGGTCGTGGTCACCAAGGACGAGACCACCATCGTCGAGGGCGCC 
GGTGACACCGACGCCATCGCCGGACGAGTGGCCCAGATCCGCCAGGAGATCGAGAACAGCGA 
CTCCGACTACGACCGTGAGAAGCTGCAGGAGCGGCTGGCCAAGCTGGCCGGTGGTGTCGCGG 
TGATCAAGGCCGGTGCCGCCACCGAGGTCGAACTCAAGGAGCGCAAGCACCGCATCGAGGAT 

15 GCGGTTCGCAATGCCAAGGCCGCCGTCGAGGAGGGCATCGTCGCCGGTGGGGGTGTGACGCT 
GTTGCAAGCGGCCCCGACCCTGGACGAGCTGAAGCTCGAAGGCGACGAGGCGACCGGCGCCA 
ACATCGTGAAGGTGGCGCTGGAGGCCCCGCTGAAGCAGATCGCCTTCAACTCCGGGCTGGAGC 
CGGGCGTGGTGGCCGAGAAGGTGCGCAACCTGCCGGCTGGCCACGGACTGAACGCTCAGACC 
GGTGTCTACGAGGATCTGCTCGCTGCCGGCGTTGCTGACCCGGTCAAGGTGACCCGTTCGGCG 

20 CTGCAGAATGCGGCGTCCATCGCGGGGCTGTTCCTGACCACCGAGGCCGTCGTTGCCGACAAG 
CCGGAAAAGGAGAAGGCTTCCGTTCCCGGTGGCGGCGACATGGGTGGCATGGATTTCTGA 

>Rv0482 murB TB.seq 570537:571643 MW:38522 
>emb|AL123456|MTBH37RV:570537-571646, murB SEQ ID NO:24 

25 ATGAAACGGAGCGGTGTCGGTTCGCTCTTTGCCGGTGCGCATATTGCCGAGGCGGTCCCGTTG 
GCGCCGCTGACCACTTTGCGTGTGGGCCCGATCGCCCGACGTGTCATCACTTGCACCAGCGCC 
GAACAGGTGGTGGCTGCGCTGCGGCACCTGGATTCGGCGGCCAAGACCGGAGCTGACCGCCC 
GCTGGTGTTTGCTGGTGGCTCCAATTTGGTGATCGCCGAGAACCTGACCGACCTGACCGTGGT 
GCGGTTGGCCAATAGCGGCATCACCATCGACGGTAACTTGGTGCGGGCCGAGGCCGGTGCGG 

30 TCTTCGATGACGTGGTGGTTAGGGCCATCGAACAGGGTCTGGGCGGACTGGAATGCCTGTCTG 
GCATCCCAGGATCGGCCGGGGCGACACeCGTGCAGAACGTGGGGGCGTATGGCGCGGAGGT 
GTCTGACACCATCACTCGGGTTCGGCTTTTGGATCGGTGCACGGGTGAGGTGCGTTGGGTATC 
CGCGCGCGACCTGCGCTTCGGCTATCGCACGAGCGTGCTCAAACACGCTGATGGGCTTGCGGT 
GCCCACCGTGGTCTTGGAGGTGGAGTTTGCGCTGGATCCGTCGGGCCGCAGCGCACCGCTGC 

35 GCTACGGCGAGCTGATCGCCGCGCTGAATGCGACCAGCGGCGAGCGCGCCGACCCGCAAGCG 
GTCCGCGAAGCGGTGCTGGCCCTGCGGGCACGCAAGGGCATGGTGCTiGGACCCGACCGACCA 
TGACACCTGGAGCGTGGGATCGTTCTTCACAAACCCGGTGGTCACCCAGGATGTTTACGAACGG 
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CTGGCCGGTGACGCGGCCACCAGAAAGGACGGTCCGGTCCCGCACTATCCCGCGCCCGACGG 
CGTCAAGCTGGCCGCCGGCTGGCTGGTGGAACGGGCCGGCTTCGGCAAGGGCTATCCGGATG 
CCGGCGCCGCCCCATGCCGGCTTTCCACCAAACATGCGCTGGCGCTGACAAATCGTGGCGGG 
GCCACCGCCGAAGATGTGGTGACGCTGGCGCGCGCCGTGCGCGATGGGGTCCATGATGTGTTT 
5 GGTATCACACTAAAACCCGAACCCGTGCTGATCGGCTGCATGTTGTAG 

>Rv0483 - TB.seq 571708:573060 MW:47859 

>emb|AL123456|MTBH37RV:571708-573063, Rv0483 SEQ ID NO:25 

GTGGTCATTCGTGTGCTGTTTCGCCCGGTATCTTTGATACCCGTGAATAACTCCAGCACCCCCCA 

10 GAGTCAGGGGCCGATCAGTCGGCGTCTGGCGTTGACGGCCCTTGGGTTTGGGGTGTTGGCACC 
GAACGTTCTGGTCGCGTGCGCCGGCAAAGTGACCAAGCTGGCCGAGAAGAGGCCGCCACCGG 
CGCCTCGTCTGACTTTCCGGCCTGCCGACTCTGCCGCCGACGTGGTGCCGATCGCGCCGATCA 
GCGTCGAGGTCGGTGACGGCTGGTTTCAGCGGGTCGCGCTGACCAATTCGGCAGGCAAGGTC 
GTCGCCGGGGCATACAGCCGGGATCGCACCATCTACACGATCACCGAGCCGCTGGGCTACGAC 

15 ACGACCTACACCTGGAGCGGTTCGGCCGTCGGCCATGACGGCAAGGCGGTTCCGGTGGCGGG 
CAAGTTCACCACCGTGGCACCCGTCAAGACGATCAACGCGGGATTCCAGCTCGCCGACGGCCA 
GACCGTCGGGATCGCGGCGCCGGTGATTATTCAGTTCGATTCACCGATCAGCGACAAGGCCGC 
CGTCGAGCGGGCACTAACCGTGACCACCGACCCGCCTGTCGAGGGCGGCTGGGCCTGGCTGC 
CCGACGAGGCGCAGGGCGCTCGCGTGCACTGGCGTCCTCGGGAGTACTACCCGGCGGGTACC 

20 ACCGTCGACGTCGACGCCAAGCTGTATGGGCTGCCGTTCGGCGACGGCGCGTACGGCGCGCA 
GGATATGTCGTTGCACTTCCAGATCGGTCGTCGTCAGGTGGTCAAGGCCGAAGTCTCGTCGCAC 
CGCATCCAAGTCGTCACCGATGCCGGCGTCATCATGGACTTCCCGTGCAGCTACGGCGAGGCC 
GACTTGGCGCGCAACGTCACCCGCAACGGCATCCACGTCGTCACCGAGAAATACTCGGACTTC 
TACATGTCCAACCCGGCCGCCGGTTACAGCCATATCCACGAACGTTGGGCGGTGCGGATTTCC 

25 AACAACGGCGAGTTCATCCATGCCAACCCTATGAGCGCCGGTGCCCAGGGCAACAGCAATGTC 
ACCAACGGCTGTATCAACCTGTCGACGGAGAACGCCGAACAGTACJACCGCAGCGCGGTCTAC 
GGTGACCCGGTTGAGGTGACCGGCAGTTCGATCCAGCTGTCCTACGCCGACGGTGACATCTGG 
GACTGGGCGGTGGACTGGGACACCTGGGTGTCGATGTCGGCGCTACCGCCACCGGCGGCCAA 
ACCGGCGGCGACGCAAATCCCGGTCACCGCCCCGGTCACGCCGTCGGATGCCCCCACCCCGT 

30 CCGGCACACCCACGACTACTAACGGACCGGGTGGGTAG 

>Rv0489 gpm phosphoglycerate mutase I TB.seq 578424:579170 MW:27217 
>emb|AL123456|MTBH37RV:578424-579173, gpm SEQ ID NO:26 

ATGGCAAACACTGGCAGCCTGGTGTTGCTGCGCCACGGCGAGAGCGACTGGAATGCCCTCAAC 
35 CTGTTCACCGGCTGGGTCGATGTCGGCCTGACGGACAAGGGCCAGGCAGAGGCGGTTCGAAG 
CGGCGAGCTGATCGCGGAACACGACCTATTGCCCGACGTGCTCTACACCTCGTTGCTGCGGCG 
CGCGATCACCACCGCGCATCTGGCGTTGGACAGCGCCGATCGGCTCTGGATTCCCGTGCGGCG 
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TAGCTGGCGGCTCAACGAACGCCACTACGGCGCGCTGCAGGGTTTGGACAAGGCCGAGACCAA 
GGCCCGCTATGGCGAAGAGCAGTTCATGGCCTGGCGGCGCAGCTATGACACGCCGCCGCCGC 
CGATCGAGCGGGGCAGTCAGTTCAGCCAGGACGCCGACCCTCGTTACGCCGACATCGGCGGT 
GGCCCGCTCACCGAATGTCTGGCTGACGTGGTCGCCCGGTTTTTGCCATATTTCACCGACGTCA 
5 TCGTTGGCGACTTGCGGGTCGGCAAGACGGTGCTGATCGTTGCCCACGGCAACTCGTTGCGCG 
CGCTGGTCAAGCACCTGGACCAGATGTCTGACGACGAAATCGTCGGACTGAACATCCCGACCG 
GAATTCCGCTGCGCTACGACCTGGATTCCGCGATGAGGCCGCTGGTGCGCGGTGGTACGTATC 
TGGACCCGGAGGCGGCAGCCGCCGGCGCCGCCGCGGTGGCCGGCCAGGGCCGCGGGTAA 

10 >Rv0490 senX 3sensor histidine kinase TB.seq 579347:580576 MW:44794 
>emb|AL123456|MTBH37RV:579347-580579, senX3 SEQ ID NO:27 

GTGACTGTGTTCTCGGCGCTGTTGCTGGCCGGGGTTTTGTCCGCGCTGGCACTGGCCGTCGGT 
GGTGCTGTTGGAATGGGGCTGACGTCGCGGGTCGTCGAACAGCGCCAACGGGTGGCCACGGA 
GTGGTCGGGAATCACGGTTTCGCAGATGTTGCAATGCATTGTCACGCTGATGCCGCTGGGCGC 

15 CGCGGTGGTGGACACCCATCGCGACGTTGTCTACCTCAACGAACGGGCCAAAGAGCTAGGTCT 
GGTGCGCGACCGCCAGCTCGATGATCAGGCCTGGCGGGCCGCCCGGCAGGCGCTGGGTGGT 
GAAGACGTCGAGTTCGACCTGTCGCCGCGCAAGCGGTCGGCCACGGGTCGATCCGGGCTATC 
AGTGCATGGGCATGCCCGGTTGCTGAGCGAGGAAGACCGCCGGTTCGCCGTGGTGTTCGTGCA 
CGACCAGTCGGATTATGCGCGGATGGAGGCGGCTAGGCGTGACTTCGTGGCCAACGTCAGTCA 

20 CGAGCTCAAGACGCCCGTCGGTGCCATGGCTCTACTCGCCGAGGCGCTGCTGGCGTCGGCCG 
ACGACTCCGAAACCGTTCGGCGGTTCGCCGAGAAGGTGCTCATTGAGGCCAACCGGCTCGGTG 
ACATGGTCGCCGAGTTGATCGAGCTATCCCGGCTACAGGGCGCCGAGCGGCTACCCAATATGA 
CCGACGTCGACGTCGATACGATTGTGTCGGAAGCGATTTCACGCCATAAGGTGGCGGCCGACA 
ACGCCGACATCGAAGTCCGCACCGACGCGCCCAGCAATCTGCGGGTGCTGGGCGACCAAACTC 

25 TGCTGGTTACCGCACTGGCAAACCTGGTTTCCAATGCGATTGCCTATTCGCCGCGCGGGTCGCT 
GGTGTCGATCAGCCGTCGCCGTCGCGGTGCCAACATCGAGATCGCCGTCACCGACCGGGGCA 
TCGGCATCGCGCCGGAAGACCAGGAGCGGGTCTTCGAACGGTTCTTCCGGGGGGACAAGGCG 
CGCTCGCGTGCCACCGGAGGCAGCGGACTCGGGTTGGCCATCGTCAAACACGTCGCGGCTAAT 
CACGACGGCACCATCCGCGTGTGGAGCAAACCGGGAACCGGGTCAACGTTCACCTTGGCTCTT 

30 CCGGCGTTGATCGAGGCCTATCACGACGACGAGCGACCCGAGCAGGCGCGAGAGCCCGAACT 
GCGGTCAAACAGGTCACAACGAGAGGAAGAGCTGAGCCGATGA 

>Rv0500 proC pyrroline-5-carboxylate reductase TB.seq 590081 :590965 MW:301 72 
>emb|AL123456|MTBH37RV:590081-590968, proC SEQ ID NO:28 
35 ATGCTTTTCGGCATGGCAAGGATCGCGATTATCGGCGGCGGCAGCATCGGTGAGGCATTGCTG 
TCGGGTCTGCTGCGGGCGGGCCGGCAGGTCAAAGACCTGGTAGTGGCCGAGCGGATGCCCGA 
TCGCGCCAACTACCTGGCGCAGACCTATTCGGTGTTGGTGACGTCGGCGGCCGACGCGGTGGA 
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GAACGCGACGTTCGTCGTCGTCGCGGTCAAACCAGCCGACGTCGAGCCGGTGATCGCGGATCT 
GGCGAACGCGACTGCGGCGGCCGAAAACGACAGTGCTGAGCAGGTGTTCGTCACCGTGGTAG 
CGGGCATCACGATCGCGTATTTCGAATCCAAGCTACCGGCTGGGACGCCAGTGGTGCGTGCGA 
TGCCGAACGCGGCGGCATTGGTGGGAGCGGGGGTTACAGCGCTGGCCAAAGGCCGCTTTGTC 
5 ACCCCGCAACAGCTTGAGGAGGTCTCGGCCTTGTTCGACGCGGTCGGCGGCGTGCTGACCGTT 
CCGGAATCGCAGTTGGACGCGGTGACCGCGGTGTCCGGCTCGGGTCCGGCCTATTTCTTTCTG 
CTGGTCGAGGCCCTGGTGGATGCCGGAGTCGGGGTGGGCTTGAGCCGTCAGGTGGCCACCGA 
TCTCGCCGCGCAGACAATGGCTGGCTCAGCGGCGATGCTGCTGGAGCGGATGGAGCAAGACC 
. AGGGTGGCGCCAATGGCGAGCTGATGGGGCTGCGCGTGGACCTTACCGCATCACGGCTGCGC 
10 GCCGCGGTTACCTCGCCGGGCGGTACGACCGCCGCTGCGCTGCGGGAACTCGAACGCGGCG 
GGTTTCGGATGGCTGTCGACGCGGCGGTTCAAGCCGCCAAAAGCCGCTCTGAGCAGCTCAGAA 
TTACACCGGAATGA 

>Rv0528 - TB.seq 618303:619889 MW:57132 

>emb|AL123456|MTBH37RV:61 8303-61 9892, Rv0528 SEQ ID NO:29 

15 ATGTGGCGGTCGTTGACGTCGATGGGCACCGCGCTGGTGCTGCTG I I I I I GCTCGCGCTGGCT 
GCCATACCCGGGGCCCTGCTGCCGCAGCGTGGCCTCAACGCCGCCAAGGTGGACGACTACCT 
GGCCGCGCACCCACTCATCGGTCCGTGGCTGGACGAGCTGCAGGCCTTCGACGTGTTCTCCAG 
CTTCTGGTTCACCGCCATCTACGTGCTGCTGTTCGTGTCCCTCGTCGGCTGTCTGGCCCCGCGG 
ACGATCGAGCACGCCCGCAGCCTGCGGGCTACACCGGTCGCCGCCCCGCGCAACCTGGCCCG 

20 GCTGCCCAAGCACGCCCACGCCCGGCTGGCCGGCGAGCCCGCCGCCCTGGCCGCCACCATCA 
CGGGCCGGCTGCGCGGCTGGCGCAGCATCACCCGGCAACAAGGCGACAGCGTGGAAGTCTCC 
GCCGAGAAGGGCTACCTGCGCGAGTTCGGCAACCTGGTGTTCCACTTCGCGCTGCTGGGTCTG 
CTGGTGGCGGTGGCCGTCGGCAAGCTGTTCGGCTACGAGGGCAACGTGATCGTGATAGCCGA 
CGGCGGACCCGGTTTTTGTTCGGCGTCGCCGGCCGCGTTCGACTCGTTTCGCGCCGGCAACAC 

25 CGTCGACGGCACGTCGTTGCACCCGATCTGTGTGCGGGTCAACAACTTCCAAGCGCACTACCT 
GCCGTCCGGGCAGGCCACCTCGTTCGCCGCCGACATCGACTATCAGGCCGACCCGGCCACTG 
CTGACCTGATCGCCAACAGCTGGCGGCCCTACCGGCTGCAGGTCAATCACCCGCTGCGGGTCG 
GCGGCGACCGGGTGTACCTGCAGGGCCACGGCTATGCGCCCACCTTCACCGTGACGTTCCCG 
GACGGGCAGACCCGCACGTCGACCGTGCAGTGGCGACCCGACAACCCGCAGACCCTGCTGTC 

30 GGCGGGCGTCGTGCGCATCGACCCGCCGGCCGGCAGCTACCCCAACCCCGACGAGCGTCGCA 
AACACCAGATCGCCATCCAGGGCCTGCTGGCTCCCACCGAGCAGCTCGACGGCACCCTGCTGT 
CGTCGCGTTTCCCCGCGCTCAATGCCCCGGCGGTGGCCATCGACATCTACCGCGGCGACACCG 
GCCTGGACAGCGGGCGGCCCCAGTCGTTGTTCACCCTGGACCACCGGCTGATCGAGCAGGGC 
CGGCTGGTCAAGGAAAAGCGGGTCAACCTGCGCGCCGGTCAGCAAGTCCGCATCGACCAAGG 

35 CCCGGCGGCCGGCACGGTGGTCCGGTTCGACGGCGCGGTGCCGTTCGTCAACCTGCAGGTCT 
CCCACGACCCCGGCCAGTCCTGGGTGCTGGTCTTCGCAATCACGATGATGGCGGGACTGCTGG 
TGTCGCTGCTGGTGCGCAGGCGCCGGGTGTGGGCGCGGATCACGCCGACGACCGCGGGTACG 
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GTAAACGTCGAGCTGGGCGGCCTGACGCGCACCGACAACTCCGGGTGGGGCGCCGAGTTCGA 

GCGGCTGACCGGGCGGTTGCTGGCGGGTTTTGAGGCGCGGTCCCCGGACATGGCCGAAGCGG 

CCGCAGGGACCGGAAGGGACGTCGATTGA 

>Rv0667 rpoB [beta] subunll of RNA polymerase TB.seq 759805:763320 MW: 1 29220 
>emb|AL123456|MTBH37RV:759805-763323, rpoB SEQ ID NO:30 

TTGGCAGATTCCCGCCAGAGCAAAACAGCCGCTAGTCCTAGTCCGAGTCGCCCGCAAAGTTCCT 

CGAATAACTCCGTACCCGGAGCGCCAAACCGGGTCTCCTTCGCTAAGCTGCGCGAACCACTTG 

AGGTTCCGGGACTCCTTGACGTCCAGACCGATTCGTTCGAGTGGCTGATCGGTTCGCCGCGCT 

GGCGCGAATCCGCCGCCGAGCGGGGTGATGTCAACCCAGTGGGTGGCCTGGAAGAGGTGCTC 

TACGAGCTGTCTCCGATCGAGGACTTCTCCGGGTCGATGTCGTTGTCGTTCTCTGACCCTCGTT 

TCGACGATGTCAAGGCACCCGTCGACGAGTGCAAAGACAAGGACATGACGTACGCGGCTCCAC 

TGTTCGTCACCGCCGAGTTCATCAACAACAACACCGGTGAGATCAAGAGTCAGACGGTGTTCAT 

GGGTGACTTCCCGATGATGACCGAGAAGGGCACGTTCATCATCAACGGGACCGAGCGTGTGGT 

GGTCAGCCAGCTGGTGCGGTCGCCCGGGGTGTACTTCGACGAGACCATTGACAAGTCCACCGA 

CAAGACGCTGCACAGCGTCAAGGTGATCCCGAGCCGCGGCGCGTGGCTCGAGTTTGACGTCGA 

CAAGCGCGACACCGTCGGCGTGCGCATCGACCGCAAACGCCGGCAACCGGTCACCGTGCTGC 

TCAAGGCGCTGGGCTGGACCAGCGAGCAGATTGTCGAGCGGTTCGGGTTCTCCGAGATCATGC 

GATCGACGCTGGAGAAGGACAACACCGTCGGCACCGACGAGGCGCTGTTGGACATCTACCGCA 

AGCTGCGTCCGGGCGAGCCCCCGACCAAAGAGTCAGCGCAGACGCTGTTGGAAAACTTGTTCT 

TCAAGGAGAAGCGCTACGACCTGGCCCGCGTCGGTCGCTATAAGGTCAACAAGAAGCTCGGGC 

TGCATGTCGGCGAGCCCATCACGTCGTCGACGCTGACCGAAGAAGACGTCGTGGCCACCATCG 

AATATCTGGTCCGCTTGCACGAGGGTCAGACCACGATGACCGTTCCGGGCGGCGTCGAGGTGC 

CGGTGGAAACCGACGACATCGACCACTTCGGCAACCGCCGCCTGCGTACGGTCGGCGAGCTG 

ATCCAAAACCAGATCCGGGTCGGCATGTCGCGGATGGAGCGGGTGGTCCGGGAGCGGATGAC 

CACCCAGGACGTGGAGGCGATCACACCGCAGACGTTGATCAACATCCGGCCGGTGGTCGCCG 

CGATCAAGGAGTTCTTCGGCACCAGCCAGCTGAGCCAATTCATGGACCAGAACAACCCGCTGTC 

GGGGTTGACCCACAAGCGCCGACTGTCGGCGCTGGGGCCCGGCGGTCTGTCACGTGAGCGTG 

CCGGGCTGGAGGTCCGCGACGTGCACCCGTCGCACTACGGCCGGATGTGCCCGATCGAAACC 

CCTGAGGGGCCCAACATCGGTCTGATCGGCTCGCTGTCGGTGTACGCGCGGGTCAACCCGTTC 

GGGTTCATCGAAACGCCGTACCGCAAGGTGGTCGACGGCGTGGTTAGCGACGAGATCGTGTAC 

CTGACCGCCGACGAGGAGGACCGCCACGTGGTGGCACAGGCCAATTCGCCGATCGATGCGGA 

CGGTCGCTTCGTCGAGCCGCGCGTGCTGGTCCGCCGCAAGGCGGGCGAGGTGGAGTACGTGC 

CCTCGTCTGAGGTGGACTACATGGACGTCTCGCCCCGCCAGATGGTGTCGGTGGCCACCGCGA 

TGATTCCCTTCCTGGAGCACGACGACGCCAACCGTGCCCTCATGGGGGCAAACATGCAGCGCC 

AGGCGGTGCCGCTGGTCCGTAGCGAGGCCCCGCTGGTGGGCACCGGGATGGAGCTGCGCGC 

GGCGATCGACGCCGGCGACGTCGTCGTCGCCGAAGAAAGCGGCGTCATCGAGGAGGTGTCGG 
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CCGACTACATCACTGTGATGCACGACAACGGCACCCGGCGTACCTACCGGATGCGCAAGTTTG 
CCCGGTCCAACCACGGCACTTGCGCCAACCAGTGCCCCATCGTGGACGCGGGCGACCGAGTC 
GAGGCCGGTCAGGTGATCGCCGACGGTCCCTGTACTGACGACGGCGAGATGGCGCTGGGCAA 
GAACCTGCTGGTGGCCATCATGCCGTGGGAGGGCCACAACTACGAGGACGCGATCATCCTGTC 
5 CAACCGCCTGGTCGAAGAGGACGTGCTCACCTCGATCCACATCGAGGAGCATGAGATCGATGC 
TCGCGACACCAAGCTGGGTGCGGAGGAGATCACCCGCGACATCCCGAACATCTCCGACGAGGT 
GCTCGCCGACCTGGATGAGCGGGGCATCGTGCGCATCGGTGCCGAGGTTCGCGACGGGGACA 
TCCTGGTCGGCAAGGTCACCCCGAAGGGTGAGACCGAGCTGACGCCGGAGGAGCGGCTGCTG 
CGTGCCATCTTCGGTGAGAAGGCCCGCGAGGTGCGCGACACTTCGCTGAAGGTGCCGCACGG 

10 CGAATCCGGCAAGGTGATCGGCATTCGGGTGTTTTCCCGCGAGGACGAGGACGAGTTGCCGGC 
CGGTGTCAACGAGCTGGTGCGTGTGTATGTGGCTCAGAAACGCAAGATCTCCGACGGTGACAA 
GCTGGCCGGCCGGCACGGCAACAAGGGCGTGATCGGCAAGATCCTGCCGGTTGAGGACATGC 
CGTTCCTTGCCGACGGCACCCCGGTGGACATTATTTTGAACACCCACGGCGTGCCGCGACGGA 
TGAACATCGGCCAGATTTTGGAGACCCACCTGGGTTGGTGTGCCCACAGCGGCTGGAAGGTCG 

15 ACGCCGCCAAGGGGGTTCCGGACTGGGCCGCCAGGCTGCCCGACGAACTGCTCGAGGCGCAG 
CCGAACGCCATTGTGTCGACGCCGGTGTTCGACGGCGCCCAGGAGGCCGAGCTGCAGGGCCT 
GTTGTCGTGCACGCTGCCCAACCGCGACGGTGACGTGCTGGTCGACGCCGACGGCAAGGCCA 
TGCTCTTCGACGGGCGCAGCGGCGAGCCGTTCCCGTACCCGGTCACGGTTGGCTACATGTACA 
TCATGAAGCTGCACCACCTGGTGGACGACAAGATCCACGCCCGCTCCACCGGGCCGTACTCGA 

20 TGATCACCCAGCAGCCGCTGGGCGGTAAGGCGCAGTTCGGTGGCCAGCGGTTCGGGGAGATG 
GAGTGCTGGGCCATGCAGGCCTACGGTGCTGCCTACACCCTGCAGGAGCTGTTGACCATCAAG 
TCCGATGACACCGTCGGCCGCGTCAAGGTGTACGAGGCGATCGTCAAGGGTGAGAACATCCCG 
GAGCCGGGCATCCCCGAGTCGTTCAAGGTGCTGCTCAAAGAACTGCAGTCGCTGTGCCTCAAC 
GTCGAGGTGCTATCGAGTGACGGTGCGGCGATCGAACTGCGCGAAGGTGAGGACGAGGACCT 

25 GGAGCGGGCCGCGGCCAACCTGGGAATCAATCTGTCCCGCAACGAATCCGCAAGTGTCGAGGA 
TCTTGCGTAA 

>Rv0668 rpoC [beta] 1 subunit of RNA polymerase TB.seq 763368:76731 5 MW:1 46740 
>emb|AL123456|MTBH37RV:763368-767318, rpoC SEQ ID NO:31 

30 GTGCTCGACGTCAACTTCTTCGATGAACTCCGCATCGGTCTTGCTACCGCGGAGGACATCAGGC 
AATGGTCCTATGGCGAGGTCAAAAAGCCGGAGACGATCAACTACCGCACGCTTAAGCCGGAGA 
AGGACGGCCTGTTCTGCGAGAAGATCTTCGGGCCGACTCGCGACTGGGAATGCTACTGCGGCA 
AGTACAAGCGGGTGCGCTTCAAGGGCATCATCTGCGAGCGCTGCGGCGTCGAGGTGACCCGC 
GCCAAGGTGCGTCGTGAGCGGATGGGCCACATCGAGCTTGCCGCGCCCGTCACCCACATCTG 

35 GTACTTCAAGGGTGTGCCCTCGCGGCTGGGGTATCTGCTGGACCTGGCCCCGAAGGACCTGGA 
GAAGATCATCTACTTCGCTGCCTACGTGATCACCTCGGTCGACGAGGAGATGCGCCACAATGAG 
CTCTCCACGCTCGAGGCCGAAATGGCGGTGGAGCGCAAGGCCGTCGAAGACCAGCGCGACGG 
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CGAACTAGAGGCCCGGGCGCAAAAGCTGGAGGCCGACCTGGCCGAGCTGGAGGCCGAGGGC 
GCCAAGGCCGATGCGCGGCGCAAGGTTCGCGACGGCGGCGAGCGCGAGATGCGCCAGATCC 
GTGACCGCGCGCAGCGTGAGCTGGACCGGTTGGAGGACATCTGGAGCACTTTCACCAAGCTGG 
CGCCCAAGCAGCTGATCGTCGACGAAAACCTCTACCGCGAACTCGTCGACCGCTACGGCGAGT 
5 ACTTCACCGGTGCCATGGGCGCGGAGTCGATCCAGAAGCTGATCGAGAACTTCGACATCGACG 
CCGAAGCCGAGTCGCTGCGGGATGTCATCCGAAACGGCAAGGGGCAGAAGAAGCTTCGCGCC 
CTCAAGCGGCTGAAGGTGGTTGCGGCGTTCCAACAGTCGGGCAACTCGCCGATGGGCATGGTG 
CTCGACGCCGTCCCGGTGATCCCGCCGGAGCTGCGCCCGATGGTGCAGCTCGACGGCGGCCG 
GTTCGCCACGTCCGACTTGAACGACCTGTACCGCAGGGTGATCAACCGCAACAACCGGCTGAA 

10 AAGGCTGATCGATCTGGGTGCGCCGGAAATCATCGTCAACAACGAGAAGCGGATGCTGCAGGA 
ATCCGTGGACGCGCTGTTCGACAATGGCCGCCGCGGCCGGCCCGTCACCGGGCCGGGCAACC 
GTCCGCTCAAGTCGCTTTCCGATCTGCTCAAGGGCAAGCAGGGCCGGTTCCGGCAGAACCTGC 
TCGGCAAGCGTGTCGACTACTCGGGCCGGTCGGTCATCGTGGTCGGCCCGCAGCTCAAGCTGC 
ACCAGTGCGGTCTGCCCAAGCTGATGGCGCTGGAGCTGTTCAAGCCGTTCGTGATGAAGCGGC 

15 TGGTGGACCTCAACCATGCGCAGAACATCAAGAGCGCCAAGCGCATGGTGGAGCGCCAGCGCC 
CCCAAGTGTGGGATGTGCTCGAAGAGGTCATCGCCGAGCACCCGGTGTTGCTGAACCGCGCAC 
CCACCCTGCACCGGTTGGGTATCCAGGCCTTCGAGCCAATGCTGGTGGAAGGCAAGGCCATTC 
AGCTGCACCCGTTGGTGTGTGAGGCGTTCAATGCCGACTTCGACGGTGACCAGATGGCCGTGC 
ACCTGCCTTTGAGCGCCGAAGCGCAGGCCGAGGCTCGCATTTTGATGTTGTCCTCCAACAACAT 

20 CCTGTCGCCGGCATCTGGGCGTCCGTTGGCCATGCCGCGGCTGGACATGGTGACCGGGCTGT 
ACTACCTGACCACCGAGGTCCCCGGGGACACCGGCGAATACCAGCCGGCCAGCGGGGATCAC 
CCGGAGACTGGTGTCTACTCTTCGCCGGCCGAAGCGATCATGGCGGCCGACCGCGGTGTCTTG 
AGCGTGCGGGCCAAGATCAAGGTGCGGCTGACCCAGCTGCGGCCGCCGGTCGAGATCGAGGC 
CGAGCTATTCGGCCACAGCGGCTGGCAGCCGGGCGATGCGTGGATGGCCGAGACCACGCTGG 

25 GCCGGGTGATGTTCAACGAGCTGCTGCCGCTGGGTTATCCGTTCGTCAACAAGCAGATGCACAA 
GAAGGTGCAGGCCGCCATCATCAACGACCTGGCCGAGCGTTACCCGATGATCGTGGTCGCCCA 
GACCGTCGACAAGCTCAAGGACGCCGGCTTCTACTGGGCCACCCGCAGCGGCGTGACGGTGT 
CGATGGCCGACGTGCTGGTGCCGCCGCGCAAGAAGGAGATCCTCGACCACTACGAGGAGCGC 
GCGGACAAGGTCGAAAAGCAGTTCCAGCGTGGCGCTTTGAACCACGACGAGCGCAACGAGGC 

30 GCTGGTGGAGATTTGGAAGGAAGCCACCGACGAGGTCGGTCAGGCGTTGCGGGAGCACTACC 
CCGACGACAACCCGATCATCACCATCGTCGACTCCGGCGCCACCGGCAACTTCACCCAGACTC 
GAACGCTGGCCGGTATGAAGGGCCTGGTGACCAACCCGAAGGGTGAGTTCATCCCGCGTCCG 
GTCAAGTCCTCCTTCCGTGAGGGCCTGACCGTGCTGGAGTACTTCATCAACACCCACGGCGCTC 
GAAAGGGCTTGGCGGACACCGCGTTGCGCACCGCCGACTCCGGCTACCTGACCCGACGTCTG 

35 GTGGACGTGTCCCAGGACGTGATCGTGCGCGAGCACGACTGCCAGACCGAGCGCGGCATCGT 
CGTCGAGCTGGCCGAGCGTGCACCCGACGGCACGCTGATCCGCGACCCGTACATCGAAACCTC 
GGCCTACGCGCGGACCCTGGGCACCGACGCGGTCGACGAGGCCGGCAACGTCATCGTCGAGC 
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GTGGTCAAGACCTGGGCGATCCGGAGATTGACGCTCTGTTGGCTGCTGGTATTACCCAGGTCAA 
GGTGCGTTCGGTGCTGACGTGTGCCACCAGCACCGGCGTGTGCGCGACCTGCTACGGGCGTT 
CCATGGCCACCGGCAAGCTGGTCGACATCGGTGAAGCCGTCGGCATCGTGGCCGCCCAGTCC 
ATCGGCGAACCCGGCACCCAGCTGACCATGCGCACCTTCCACCAGGGTGGCGTCGGTGAGGA 

5 CATCACCGGTGGTCTGCCCCGGGTGCAGGAGCTGTTCGAGGCCCGGGTACCGCGTGGCAAGG 
CGCCGATCGCCGACGTCACCGGCCGGGTTCGGCTCGAGGACGGCGAGCGGTTCTACAAGATC 
ACCATCGTTCCTGACGACGGCGGTGAGGAAGTGGTCTACGACAAGATCTCCAAGCGGCAGCGG 
CTGCGGGTGTTCAAGCACGAAGACGGTTCCGAACGGGTGCTCTCCGATGGCGACCACGTCGAG 
GTGGGCCAGCAGCTGATGGAAGGCTCGGCCGACCCGCATGAGGTGCTGCGGGTGCAGGGCCC 

10 CCGCGAGGTGCAGATACACCTGGTTCGCGAGGTCCAGGAGGTCTACCGCGCCCAAGGTGTGTC 
GATCCACGACAAGCACATCGAGGTGATCGTTCGCCAGATGCTGCGCCGGGTGACCATCATCGA 
CTCGGGCTCGACGGAGTTTTTGCCTGGCTCGCTGATCGACCGCGCGGAGTTCGAGGCAGAGAA 
CCGCCGAGTGGTGGCCGAGGGCGGTGAGCCCGCGGCCGGCCGTCCGGTGCTGATGGGCATC 
ACGAAGGCGTCGCTGGCCACCGACTCGTGGCTGTCGGCGGCGTCGTTCCAGGAGACCACTCG 

15 CGTGCTGACCGATGCGGCGATCAACTGCCGCAGCGATAAGCTCAACGGTCTGAAGGAAAACGT 
GATCATCGGCAAGCTGATCCCGGCCGGTACCGGTATCAACCGCTACCGCAACATCGCGGTGCA 
GCCCACCGAGGAGGCCCGCGCTGCGGCGTACACCATCCCGTCGTATGAGGATCAGTACTACAG 
CCCGGACTTCGGTGCGGCCACCGGTGCTGCCGTCCCGCTGGACGACTACGGCTACAGCGACTA 
CCGCTAG 

20 

>Rv071 1 atsA TB.seq 806333:808693 MW:86216 
>emb|AL123456|MTBH37RV:806333-808696, atsA SEQ ID NO:32 

ATGGCACCCGAGGCCACCGAGGCGTTCAACGGCACCATCGAGCTGGATATTCGTGATTCGGAG 
CCGGATTGGGGCCCATACGCAGCGCCGGTGGCACCGGAGCACTCACCAAACATCCTGTATCTG 

25 GTCTGGGACGACGTCGGCATCGCGACCTGGGACTGCTTTGGCGGCCTGGTCGAGATGCCCGC 
GATGACGCGCGTCGCCGAGCGTGGCGTGCGACTGTCGCAATTTCACACCACCGCACTGTGCTC 
GCCGACCCGGGCGTCGCTGCTGACCGGTCGCAACGCCACCACCGTAGGCATGGCTACCATCG 
AAGAGTTCACCGACGGGTTCCCCAACTGCAACGGGCGGATCCCGGCTGACACCGCGTTGCTCC 
CAGAGGTGCTGGCCGAACATGGCTACAACACCTACTGTGTGGGCAAGTGGCACCTGACGCCAC 

30 TCGAAGAATCCAATATGGCGTCGACGAAGCGGCACTGGCCGACCTCGCGTGGGTTCGAGCGGT 
TCTACGGATTCCTAGGCGGGGAGACCGACCAGTGGTATCCCGACCTGGTATACGACAACCACC 
CAGTGAGTCCTCCCGGCACACCCGAGGGTGGCTACCACCTGTCAAAAGACATCGCCGACAAGA 
CGATCGAGTTCATTCGTGATGCCAAGGTGATCGCGCCCGACAAGCCGTGGTTCAGCTACGTGTG 
CCCAGGCGCCGGGCATGCGCCGCACCACGTCTTCAAGGAATGGGCGGACAGATACGCCGGCC 

35 GATTCGACATGGGGTATGAGCGCTATCGCGAGATCGTGCTGGAAAGGCAAAAGGCGCTAGGGA 
TCGTGCCACCCGACACCGAACTGTCGCCCATAAACCCTTATCTGGATGTGCCGGGGCCAAACG 
GCGAGACCTGGCCGCTGCAGGACACGGTGCGGCCGTGGGACTCGCTGAGCGATGAAGAAAAG 
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AAGCTGTTTTGCCGGATGGCCGAGGTGTTCGCCGGCTTTCTGAGCTACACCGACGCCCAGATC 
GGACGGATCCTGGACTACCTCGAGGAATCCGGCCAGCTGGACAACACCATCATCGTGGTGATC 
TCCGACAACGGCGCCAGCGGCGAGGGCGGACCCAACGGATCGGTCAACGAAGGCAAGTTCTT 
CAACGGCTACATCGACACCGTCGCTGAAAGCATGAAGCTCTTCGACCACCTCGGTGGCCCGCA 
5 GACCTACAACCACTACCCCATCGGGTGGGCAATGGCCTTCAACACCCCCTACAAGCTGTTCAAG 
CGCTACGCCTCGCATGAAGGCGGCATTGCCGACCCGGCAATCATCTCCTGGCCCAACGGCATT 
GCCGCACACGGTGAAATCCGCGACAACTACGTCAATGTCAGCGACATCACGCCCACCGTCTAC 
GACCTGTTGGGCATGACACCGCCGGGGACCGTCAAGGGGATTCCGCAGAAACCGATGGACGG 
CGTGAGCTTCATAGCGGCCCTTGCCGACCCGGCCGCCGACACCGGCAAGACCACCCAGTTCTA 
10 CACCATGCTGGGCACCCGCGGGATCTGGCATGAAGGTTGGTTCGCCAACACCATTCACGCGGC 
CACGCCCGCCGGCTGGTCGAATTTCAACGCTGACCGCTGGGAACTGTTCCACATCGCAGCAGA 
CCGCAGCCAGTGCCACGACCTGGCCGCCGAGCATCCCGACAAACTTGAGGAGCTCAAGGCGCT 
GTGGTTCTCCGAAGCCGCCAAGTACAACGGGCTGCCGCTGGCCGATCTGAACCTCCTGGAAAC 
GATGACTCGGTCGCGGCCTTACCTGGTCAGCGAACGAGCCAGCTACGTCTACTATCCCGACTG 
15 CGCTGACGTCGGCATCGGCGCGGCCGTAGAGATTCGCGGGCGCTCGTTCGCCGTGCTGGCCG 
ATGTGACCATCGATACCACCGGCGCCGAGGGCGTGCTGTTCAAGCACGGCGGCGCCCATGGC 
GGGCACGTGCTGTTCGTCCGGGACGGACGCTTGCACTACGTCTACAACTTCCTCGGTGAGCGC 
CAGCAGCTGGTCAGCTCGTCGGGTCCGGTCCCGTCGGGAAGACATCTACTCGGGGTTCGTTAT 
TTGCGGACCGGAACCGTGCCCAACAGTCACACGCCGGTGGGCGATCTTGAGCTGTTCTTCGAC 
20 GAGAACCTGGTCGGCGCCCTGACCAATGTGCTGACCCACCCTGGAACGTTCGGGTTGGCCGGC 
GCCGCTATCAGCGTTGGCCGCAACGGCGGTTCGGCTGTGTCCAGCCACTACGAAGCGCCGTTC 
GCGTTCACCGGCGGTACCATCACCCAGGTCACCGTCGACGTGTCAGGCCGACCGTTCGAAGAT 
GTGGAATCCGATCTTGCGCTTGCTTTTTCGCGTGACTGA 

>Rv0764c - lanosterol 14-demethylase cytochrome P450 TB.seq 856683:858035 MW:50879 
>emb|AL123456|MTBH37RV:c858035-856680, Rv0764c SEQ ID NO:33 

ATGAGCGCTGTTGCACTACCCCGGGTTTCGGGTGGCCACGACGAACACGGCCACCTCGAGGAG 
TTCCGCACCGATCCGATCGGGCTGATGCAACGGGTCCGCGACGAATGCGGAGACGTCGGTACC 
TTCCAGCTGGCCGGGAAGCAGGTCGTGCTGCTGTCCGGCTCGCACGCCAACGAATTCTTCTTC 
CGGGCGGGCGACGACGACCTGGACCAGGCCAAGGCATACCCGTTCATGACGCCGATCTTCGG 
CGAGGGCGTGGTGTTCGACGCCAGCCCGGAACGGCGTAAAGAGATGCTGCACAATGCCGCGC 
TACGCGGCGAGCAGATGAAGGGCCACGCTGCCACCATCGAAGATCAAGTCCGACGGATGATCG 
CCGACTGGGGTGAGGCCGGCGAGATCGATCTGCTGGACTTCTTCGCCGAGCTGACCATCTACA 
CCTCCTCGGCCTGCCTGATCGGCAAGAAGTTCCGCGACCAGCTCGACGGGCGATTCGCCAAGC 
TCTATCACGAGTTGGAGCGCGGCACCGACCCACTAGCCTACGTCGACCCGTATCTGCCGATCG 
AGAGCTTCCGTCGCCGCGACGAAGCCCGCAATGGTCTGGTGGCACTGGTTGCGGACATCATGA 
ACGGCCGGATCGCCAACCCACCCACCGACAAGAGCGACCGTGACATGCTCGACGTGCTCATCG 
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CCGTCAAGGCTGAGACCGGCACTCCCCGGTTCTCGGCCGACGAGATCACCGGCATGTTCATCT 
CGATGATGTTCGCCGGCCATCACACCAGCTCGGGTACGGCTTCGTGGACGCTGATCGAGTTGA 
TGCGCCATCGCGACGCCTACGCGGCCGTGATCGACGAACTCGACGAGCTGTACGGCGACGGC 
CGATCGGTGAGTTTCCATGCGCTGCGCCAGATTCCGCAGCTGGAAAACGTGCTGAAAGAGACG 
5 CTGCGCCTGCACCCTCCGCTGATCATCCTCATGCGAGTGGCCAAGGGCGAGTTCGAGGTGCAA 
GGCCACCGGATTCATGAGGGCGATCTGGTGGCGGCCTCCCCGGCGATCTCCAACCGGATCCCC 
GAAGACTTCCCCGATCCCCACGACTTCGTGCCAGCACGATACGAGCAGCCGCGCCAGGAAGAT 
CTGCTCAACCGCTGGACGTGGATTCCGTTCGGCGCCGGCCGGCATCGTTGCGTGGGGGCGGC 
GTTCGCCATCATGCAGATCAAAGCGATCTTCTCGGTGTTGTTGCGCGAGTATGAGTTTGAGATG 
10 GCGCAACCGCCAGAAAGCTATCGTAACGACCATTCGAAGATGGTGGTGCAGTTGGCCCAGCCC 
GCTTGCGTGCGCTACCGCCGGCGAACGGGAGTTTAA 

>Rv0861c - DNA helicase TB.seq 958524:960149 MW:59773 
>emb|AL123456|MTBH37RV:c960149-958521, Rv0861cSEQ ID NO:34 

15 GTGCAGTCCGATAAGACGGTGCTGTTGGAAGTCGACCATGAACTGGCCGGCGCTGCACGCGCC 
GCCATCGCGCCGTTCGCCGAGCTGGAACGTGCACCCGAACATGTCCACACCTACCGCATCACA 
CCGCTGGCACTGTGGAATGCTCGCGCCGCCGGCCATGATGCCGAGCAAGTCGTCGACGCGCT 
GGTCAGTTACTCCCGCTACGCGGTGCCGCAACCCTTGCTCGTCGACATCGTCGACACCATGGC 
CCGCTACGGACGACTGCAGTTGGTCAAGAACCCGGCCCATGGCCTGACGCTGGTGAGCCTGGA 

20 CCGCGCGGTGCTTGAGGAAGTGCTGCGCAACAAGAAGATCGCGCCGATGCTTGGCGCCCGCAT 
CGATGACGACACCGTCGTCGTCCACCCCAGCGAACGCGGCCGGGTCAAGCAGCTGCTGCTCAA 
GATCGGTTGGCCCGCAGAGGATCTCGCCGGCTACGTCGATGGTGAAGCGCACCCGATCAGCCT 
GCAGCAGGAGGGCTGGCAGCTGCGCGA7TACCAGCGGCTGGCCGCGGACTCGTTCTGGGCGG 
GCGGCTCCGGGGTGGTGGTGCTGCCATGTGGGGCCGGCAAGACGCTGGTCGGTGCGGCCGC 

25 AATGGCCAAAGCCGGCGCGACGACGTTGATCCTGGTCACCAATATCGTCGCGGCCCGGCAATG 
GAAACGAGAGCTGGTCGCGCGCACCTCGCTCACCGAGAATGAGATCGGCGAATTCTCGGGAGA 
ACGCAAGGAAATCCGACCTGTCACCATCTCGACATACCAGATGATCACCCGCCGCACTAAGGGC 
GAGTACCGCCATCTGGAACTGTTCGACAGCCGCGACTGGGGGCTCATCATCTATGACGAGGTG 
CACCTGTTGCCGGCACCGGTCTTCCGGATGACCGCTGACCTGCAGTCCAAACGGCGGCTGGGG 

30 CTGACCGCCACGTTGATCCGTGAAGACGGACGCGAGGGCGACGTGTTTTCCCTTATCGGACCA 
AAGCGCTATGACGCGCCGTGGAAGGACATTGAGGCGCAGGGCTGGATCGCGCCAGCTGAGTG 
CGTGGAAGTCCGGGTCACGATGACCGACAGCGAGCGGATGATGTACGCCACCGCCGAACCCG 
AAGAACGCTACCGGATCTGCTCGACGGTGCACACCAAAATTGCTGTGGTCAAGTCGATTCTGGC 
GAAGCACCCGGATGAGCAGACCCTGGTCATCGGAGCGTACTTGGATCAGCTCGACGAGCTGGG 

35 CGCCGAGCTCGGCGCTCCGGTGATTCAGGGGTCGACAAGGACCAGCGAACGCGAGGCACTGT 
TCGACGCCTTCCGCCGCGGCGAGGTCGCTACGCTCGTGGTGTCCAAGGTGGCTAACTTCTCCA 
TCGACTTGCCGGAAGCCGCCGTGGCGGTACAGGTTTCGGGAACATTCGGCTCACGCCAGGAAG 
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AGGCGCAACGGCTCGGCCGGATATTGCGACCCAAGGCCGACGGGGGCGGTGCCATCTTCTAC 

TCGGTGGTGGCCCGCGACAGCCTGGATGCCGAGTACGCCGCACACCGGCAGCGGTTTTTAGCT 

GAGCAGGGCTACGGTTACATCATCCGCGACGCCGACGACCTGCTGGGCCCGGCAATTTAG 

5 >Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 

>emb|AL123456|MTBH37RV:c1 0081 78-1 006691, accD3 SEQ ID NO:35 

GTGAGTCGTATCACGACCGACCAACTGCGGCACGCGGTGCTAGACCGGGGATCTTTCGTCAGC 
TGGGATAGCGAGCCGCTGGCGGTGCCGGTAGCCGACTCCTATGCGCGGGAGCTGGCCGCCGC 
TCGGGCGGCCACCGGCGCGGACGAATCGGTGCAGACCGGTGAGGGACGCGTATTCGGGCGG 

10 CGGGTGGCCGTGGTGGCCTGTGAGTTCGACTTCCTGGGCGGCTCGATTGGGGTGGCAGCGGC 
CGAACGGATCACCGCCGCCGTCGAGCGGGCGACCGCCGAGCGGCTGCCGCTACTGGCGTCAC 
CAAGCTCGGGAGGCACCCGCATGCAAGAAGGCACGGTCGCGTTTCTGCAGATGGTGAAGATCG 
CTGCGGCCATCCAGCTGCACAACCAGGCGCGCCTGCCCTACCTGGTCTATTTGCGCCATCCGA 
CCACGGGTGGAGTTTTCGCGTCGTGGGGCTCGCTGGGGCATCTCACCGTCGCCGAGCCGGGC 

15 GCCCTGATCGGCTTTCTGGGACCACGGGTCTATGAGTTGCTCTATGGCGACCCCTTCCCATCCG 
GCGTCCAAACCGCCGAGAATCTACGGCGGCATGGGATCATCGACGGCGTCGTTGCACTGGACC 
GGCTACGACCGATGCTGGATCGTGCGTTGACGGTGCTCATCGACGCTCCCGAACCGCTTCCGG 
CACCGCAGACGCCCGCGCCCGTACCCGATGTGCCCACGTGGGACTCGGTGGTGGCATCGCGC 
CGGCCGGACCGGCCGGGCGTCAGGCAGCTACTGCGACACGGCGCCACCGACCGGGTGTTGTT 

20 GTCAGGAACCGATCAAGGCGAAGCGGCGACCACGCTGCTGGCGCTGGCCCGCTTTGGCGGCC 
AACCCACGGTGGTCCTCGGCCAGCAAAGGGCAGTAGGCGGCGGGGGAAGCACTGTCGGGCCC 
GCTGCGTTACGCGAAGCCCGACGCGGGATGGCGCTCGCCGCCGAGCTGTGCCTGCCGCTGGT 
GCTGGTCATTGACGCGGCCGGACCCGCGTTGTCGGCCGCAGCCGAACAGGGCGGGCTGGCCG 
GCCAGATCGCGCATTGCCTGGCCGAGCTCGTCACGCTGGATACCCCGACCGTGTCGATCCTGC 

25 TGGGCCAGGGCAGCGGCGGGCCGGCGCTGGCGATGTTGCCCGCCGACCGGGTGCTGGCCGC 
ACTCCACGGCTGGCTGGCGCCCTTGCCTCCCGAAGGAGCCAGCGCGATCGTGTTCCGAGACAC 
TGCTCATGCCGCCGAACTCGCTGCCGCCCAAGGCATCCGGTCGGCCGACCTACTGAAGTCGGG 
GATTGTCGACACCATCGTGCCGGAGTACCCCGACGCCGCAGACGAGCCGATCGAGTTCGCCCT 
ACGACTGTCGAACGCCATCGCCGCCGAAGTGCACGCGTTACGGAAGATACCGGCCCCGGAACG 

30 CCTCGCGACTCGGTTGCAACGCTACCGCCGGATCGGGTTGCCCCGCGACTAA 

>Rv0983 -TB.seq 1099064:1100455 MW:46454 

>emb|AL123456|MTBH37RV:1099064-1 100458, Rv0983 SEQ ID NO:36 

ATGGCCAAGTTGGCCCGAGTAGTGGGCCTAGTACAGGAAGAGCAACCTAGCGACATGACGAAT 
35 CACCCACGGTATTCGCCACCGCCGCAGCAGCCGGGAACCCCAGGTTATGCTCAGGGGCAGCA 
GCAAACGTACAGCCAGCAGTTCGACTGGCGTTACCCACCGTCCCCGCCCCCGCAGCCAACCCA 
GTACCGTCAACCCTACGAGGCGTTGGGTGGTACCCGGCCGGGTCTGATACCTGGCGTGATTCC 
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GACCATGACGCCCCCTCCTGGGATGGTTCGCCAACGCCCTCGTGCAGGCATGTTGGCCATCGG 
CGCGGTGACGATAGCGGTGGTGTCCGCCGGCATCGGCGGCGCGGCCGCATCCCTGGTCGGGT 
TCAACCGGGCACCCGCCGGCCCCAGCGGCGGCCCAGTGGCTGCCAGCGCGGCGCCAAGCAT 
CCCCGCAGCAAACATGCCGCCGGGGTCGGTCGAACAGGTGGCGGCCAAGGTGGTGCCCAGTG 
5 TCGTCATGTTGGAAACCGATCTGGGCCGCCAGTCGGAGGAGGGCTCCGGCATCATTCTGTCTG 
CCGAGGGGCTGATCTTGACCAACAACCACGTGATCGCGGCGGCCGCCAAGCCTCCCCTGGGC 
AGTCCGCCGCCGAAAACGACGGTAACCTTCTCTGACGGGCGGACCGCACCCTTCACGGTGGTG 
GGGGCTGACCCCACCAGTGATATCGCCGTCGTCCGTGTTCAGGGCGTCTCCGGGCTCACCCCG 
ATCTCCCTGGGTTCCTCCTCGGACCTGAGGGTCGGTCAGCCGGTGCTGGCGATCGGGTCGCCG 

10 CTCGGTTTGGAGGGCACCGTGACCACGGGGATCGTCAGCGCTCTCAACCGTCCAGTGTCGACG 
ACCGGCGAGGCCGGCAACCAGAACACCGTGCTGGACGCCATTCAGACCGACGCCGCGATCAA 
CCCCGGTAACTCCGGGGGCGCGCTGGTGAACATGAACGCTCAACTCGTCGGAGTCAACTCGGC 
CATTGCCACGCTGGGCGCGGACTCAGCCGATGCGCAGAGCGGCTCGATCGGTCTCGGTTTTGC 
GATTCCAGTCGACCAGGCCAAGCGCATCGCCGACGAGTTGATCAGCACCGGCAAGGCGTCACA 

1 5 TGCCTCCCTGGGTGTGCAGGTG ACCAATGACAAAGACACCCTGGGCGCCAAGATCGTCGAAGT 
AGTGGCCGGTGGTGCTGCCGCGAACGCTGGAGTGCCGAAGGGCGTCGTTGTCACCAAGGTCG 
ACGACCGCCCGATCAACAGCGCGGACGCGTTGGTTGCCGCCGTGCGGTCCAAAGCGCCGGGC 
GCCACGGTGGCGCTAACCTTTCAGGATCCCTCGGGCGGTAGCCGCACAGTGCAAGTCACCCTC 
GGCAAGGCGGAGCAGTGA 

20 

>Rv1008 - Similarto E.coli protein YcfH TB.seq 1127087:1127878 MW:29066 
>emb|AL123456|MTBH37RV: 11 27087-1 127881, Rv1008 SEQ ID NO:37 

TTGGTCGACGCCCACACCCATCTCGACGCGTGCGGTGCACGAGACGCCGATACGGTGCGGTC 
GCTCGTCGAGCGAGCCGCCGCGGCCGGCGTGACCGCGGTGGTCACCGTCGCCGACGACCTG 

25 GAGTCCGCGCGCTGGGTCACCCGCGCGGCCGAATGGGATCGGCGAGTCTATGCCGCGGTGGC 
GTTGCACCCGACCCGCGCCGATGCGCTCACCGACGCTGCCCGTGCCGAGCTCGAGCGATTGG 
TTGCCCACCCCAGGGTGGTGGCCGTCGGTGAGACCGGAATCGACATGTACTGGCCGGGTCGC 
CTGGACGGGTGTGCGGAGCCGCACGTCCAGCGGGAGGCCTTTGCCTGGCATATCGATCTGGC 
CAAGCGGACCGGTAAACCGCTGATGATCCACAATCGTCAGGCCGACCGCGACGTGCTGGACGT 

30 GCTGCGGGCCGAGGGCGCGCCGGACACCGTGATCTTGCACTGCTTCTCGTCGGACGCGGCGA 
TGGCCCGCACGTGTGTGGACGCCGGGTGGCTGCTCAGCCTGTCCGGGACGGTGAGCTTCCGT 
ACCGCCCGTGAACTACGGGAAGCCGTCCCGCTGATGCCGGTGGAGCAGCTTTTGGTGGAAACC 
GATGCACCGTATTTGACCCCGCATCCCCACCGGGGCTTGGCGAACGAACCGTACTGCCTGCCC 
TATACCGTGCGGGCGCTGGCTGAACTGGTCAATCGGCGCCCCGAAGAGGTGGCGCTCATCACC 

35 ACAAGCAACGCTCGCCGAGCTTATGGGCTAGGGTGGATGCGCCAATGA 
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>Rv1009 - lipoprotein, similar to various other MTB proteins TB.seq 1 128089:1 129174 MW:38079 
>emb|AL123456|MTBH37RV:1 128089-1 129177, Rv1009 SEQ ID NO:38 

ATGTTGCGCCTGGTAGTCGGTGCGCTGCTGCTGGTGTTGGCGTTCGCCGGTGGCTATGCGGTC 

GCCGCATGCAAAACGGTGACGTTGACCGTCGACGGAACCGCGATGCGGGTGACCACGATGAAA 

TCGCGGGTGATCGACATCGTCGAAGAGAACGGGTTCTCAGTCGACGACCGCGACGACCTGTAT 

CCCGCGGCCGGCGTGCAGGTCCATGACGCCGACACCATCGTGCTGCGGCGTAGCCGTCCGCT 

GCAGATCTCGCTGGATGGTCACGACGCTAAGCAGGTGTGGACGACCGCGTCGACGGTGGACG 

AGGCGCTGGCCCAACTCGCGATGACCGACACGGCGCCGGCCGCGGCTTCTCGCGCCAGCCGC 

GTCCCGCTGTCCGGGATGGCGCTACCGGTCGTCAGCGCCAAGACGGTGCAGCTCAACGACGG 

CGGGTTGGTGCGCACGGTGCACTTGCCGGCCCCCAATGTCGCGGGGCTGCTGAGTGCGGCCG 

GCGTGCCGCTGTTGCAAAGCGACCACGTGGTGCCCGCCGCGACGGCCCCGATCGTCGAAGGC 

ATGCAGATCCAGGTGACCCGCAATCGGATCAAGAAGGTCACCGAGCGGCTGCCGCTGCCGCCG 

AACGCGCGTCGTGTCGAGGACCCGGAGATGAACATGAGCCGGGAGGTCGTCGAAGACCCGGG 

GGTTCCGGGGACCCAGGATGTGACGTTCGCGGTAGCTGAGGTCAACGGCGTCGAGACCGGCC 

GTTTGCCCGTCGCCAACGTCGTGGTGACCCCGGCCCACGAAGCCGTGGTGCGGGTGGGCACC 

AAGCCCGGTACCGAGGTGCCCCCGGTGATCGACGGAAGCATCTGGGACGCGATCGCCGGCTG 

TGAGGCCGGTGGCAACTGGGCGATCAACACCGGCAACGGGTATTACGGTGGTGTGCAGTTTGA 

CCAGGGCACCTGGGAGGCCAACGGCGGGCTGCGGTATGCACCCCGCGCTGACCTCGCCACCC 

GCGAAGAGCAGATCGCCGTTGCCGAGGTGACCCGACTGCGTCAAGGTTGGGGCGCCTGGCCG 

GTATGTGCTGCACGAGCGGGTGCGCGCTGA 

>Rv1010 ksgA 16S rRNAdimethyltransferase TB.seq 1129150:1130100 MW:34647 
>emb|AL123456|MTBH37RV:1 129150-1 130103, ksgA SEQ ID NO:39 

ATGTGCTGCACGAGCGGGTGCGCGCTGACCATCCGGCTGCTCGGGCGCACTGAGATCAGGCG 

GCTGGCCAAAGAGCTCGACTTTCGGCCGCGCAAATCTCTCGGACAGAACTTCGTGCACGACGC 

CAACACGGTGCGACGGGTGGTTGCCGCCTCCGGGGTCAGCCGTTCCGACCTGGTTTTGGAGGT 

CGGGCCGGGCCTGGGATCGCTGACCCTGGCACTGCTCGACCGCGGCGCGACCGTCACCGCGG 

TCGAGATCGATCCACTACTGGCTTCTCGGCTGCAACAGACCGTGGCGGAGCACTCGCACAGCG 

AGGTTCACCGACTAACGGTGGTCAATCGCGACGTCCTGGCCCTGCGCCGGGAGGATCTAGCCG 

CGGCGCCGACCGCGGTGGTTGCCAATCTGCCGTACAACGTAGCGGTACCGGCGTTGTTGCATC 

TGCTTGTCGAGTTCCCGTCGATCCGTGTCGTGACGGTGATGGTGCAGGCCGAGGTCGCCGAAC 

GGCTCGCCGCCGAGCCGGGCAGCAAAGAGTACGGCGTGCCCAGCGTTAAGCTGCGCTTCTTC 

GGGCGGGTTCGCCGCTGCGGCATGGTGTCGCCGACCGTTTTCTGGCCCATTCCGCGTGTCTAT 

TCCGGGCTGGTACGCATCGATCGATATGAGACCTCGCCCTGGCCCACCGACGACGCTTTTCGA 

CGGCGGGTATTCGAACTCGTGGACATCGCATTCGCGCAGCGGCGCAAGACTTCTCGCAACGCG 

TTTGTGCAGTGGGCGGGCTCGGGAAGCGAGTCGGCGAATCGATTGTTGGCGGCCAGCATCGAC 

CCCGCCCGTCGCGGTGAGACGCTGTCCATCGACGACTTCGTGCGGCTGCTGCGACGGTCCGG 
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CGGCTCCGACGAGGCCACCAGCACCGGCCGGGACGCCAGGGCGCCGGACATTTCGGGGCAC 
GCGTCGGCGAGCTGA 

>Rv1011- Homology to E.coli protein YcbH TB.seq 1130189:1131106 MW:31350 

5 >ernb|AL123456|MTBH37RV:1130189-1131109, Rv1011 SEQ ID NO:40 

GTGCCCACCGGGTCGGTCACCGTTCGGGTGCCCGGAAAGGTCAACCTCTATCTGGCGGTCGGC 
GATCGCCGCGAGGACGGCTATCACGAGCTGACCACGGTATTTCATGCCGTCTCGCTGGTCGAC 
GAGGTAACCGTTCGTAACGCTGATGTGCTCTCGCTCGAGTTGGTCGGCGAGGGGGCCGACCAG 
CTGCCGACCGACGAACGCAATCTCGCCTGGCAGGCGGCCGAGCTGATGGCCGAACACGTGGG 

1 0 CCGGGCGCCGGACGTCTCGATCATGATCGACAAATCCATTCCGGTCGCCGGCGGCATGGCCG 
GTGGCAGCGCGGACGCTGCGGCGGTCCTGGTTGCGATGAACTCGTTGTGGGAACTCAATGTGC 
CCCGCCGCGACCTGCGCATGCTCGCCGCGCGGCTAGGCAGCGATGTGCCGTTTGCCCTGCAT 
GGTGGTACCGCGCTGGGGACGGGTCGCGGCGAGGAGTTGGCCACCGTGTTATCCCGCAACAC 
CTTCCACTGGGTCCTGGCGTTCGCCGACAGCGGGTTGCTCACCTCCGCGGTGTACAACGAGCT 

15 CGACCGGCTCAGGGAGGTGGGGGATCCGCCCCGGCTTGGTGAGCCCGGGCCGGTTCTGGCTG 
CCTTAGCTGCGGGTGATCCGGATCAGCTGGCGCCGTTGCTGGGTAATGAAATGCAAGCGGCCG 
CGGTGAGCCTGGACCCGGCGCTGGCTCGTGCGTTACGCGCCGGTGTGGAGGCCGGCGCGCTC 
GCAGGCATCGTGTCCGGTTCGGGTCCCACGTGTGCCTTCCTGTGCACCTCGGCGAGCTCGGCG 
ATCGATGTCGGCGCGCAGCTGTCGGGGGCGGGAGTTTGTCGCACCGTTCGAGTCGCCACCGG 

20 GCCGGTACCCGGCGCCCGCGTGGTGTCTGCGCCGACCGAAGTGTGA 

>Rv1 106c - cholesterol dehydrogenase TB.seq 1232845:1233954 MW.40743 
>emb|AL123456|MTBH37RV:c1233954-1232842, Rv1106c SEQ ID NO:41 

ATGCTTCGCCGCATGGGTGATGCATCGCTGACAACCGAGCTCGGCCGCGTTCTGGTCACCGGC 
25 GGCGCGGGCTTCGTGGGCGCCAACCTGGTGACCACCTTGCTGGACCGCGGGCACTGGGTGCG 
TTCCTTCGACCGCGCGCCGTCGCTGTTGCCTGCGCATCCGCAACTGGAGGTGCTGCAAGGGGA 
CATCACCGACGCGGACGTCTGCGCCGCGGCCGTGGACGGCATCGACACGATCTTCCACACCG 
CAGCGATCATCGAGCTGATGGGCGGCGCGTCGGTCACCGACGAGTACCGCCAACGTAGCTTTG 
CGGTCAACGTCGGCGGCACCGAGAACCTGCTGCACGCCGGCCAGCGGGCCGGGGTGCAGCG 
30 GTTCGTCTACACGTCATCCAACAGTGTGGTGATGGGCGGCCAGAACATCGCCGGCGGTGACGA 
GACGCTGCCCTATACCGACCGGTTCAACGACCTCTACACCGAGACCAAGGTGGTTGCCGAGCG 
ATTCGTGTTGGCCCAGAACGGTGTCGACGGCATGCTGACGTGCGCGATCCGGCCCAGCGGCAT 
CTGGGGAAACGGCGATCAGACGATGTTCCGCAAGCTGTTCGAAAGTGTGCTCAAGGGCCACGT 
CAAGGTGCTGGTCGGGCGCAAGTCGGCCCGGCTGGATAACTCTTACGTGCACAACCTGATTCA 
35 CGGTTTCATCTTGGCCGCTGCCCATCTGGTGCCGGACGGCACAGCGCCCGGGCAGGCTTACTT 
CATCAACGACGCAGAGCCGATCAATATGTTCGAGTTCGCTCGGCCGGTGCTCGAGGCGTGCGG 
GCAGCGCTGGCCGAAGATGCGGATTTCCGGCCCCGCGGTCCGCTGGGTAATGACGGGGTGGC 
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AGCGGCTGCACTTCCGGTTCGGATTCCCCGCGCCGCTGCTCGAGCCGCTGGCCGTCGAACGAC 
TGTACCTGGACAACTACTTTTCGATCGCTAAGGCACGCCGCGACCTGGGCTATGAGCCGCTGTT 
CACCACCCAGCAGGCGCTGACCGAATGCCTGCCGTACTACGTGAGTCTGTTTGAGCAGATGAA 
GAACGAGGCCCGGGCGGAAAAAACGGCCGCCACAGTCAAGCCGTAG 

>Rv1110 lytB2TB.seq 1236183:1237187 MW:36298 
>emb|AL123456|MTBH37RV:1236183-1237190, lytB 1 SEQ ID NO:42 

ATGGTTCCGACGGTCGACATGGGGATTCCCGGGGCTTCGGTATCGTCGCGATCGGTGGCCGAC 

CGTCCCAACCGTAAGCGGGTGCTGCTGGCCGAGCCGCGTGGCTACTGCGCTGGCGTGGATCG 

GGCCGTCGAAACGGTCGAACGCGCGCTTCAAAAACACGGCCCGCCTGTCTACGTGCGTCACGA 

GATCGTGCATAACCGCCACGTGGTTGACACCCTGGCTAAGGCCGGTGCGGTTTTCGTCGAAGA 

GACCGAGCAGGTTCCCGAGGGAGCGATTGTGGTGTTCTCCGCGCACGGGGTCGCGCCTACGG 

TGCACGTCAGCGCCAGCGAGCGCAACCTGCAGGTCATTGACGCCACCTGCCCGCTGGTCACCA 

AGGTGCACAACGAGGCCAGGCGGTTCGCCCGGGACGACTACGACATCTTGCTGATCGGTCATG 

AGGGCCACGAGGAAGTCGTCGGTACTGCTGGGGAAGCTCCCGATCATGTGCAGCTGGTCGACG 

GGGTGGACGCCGTCGACCAGGTGACCGTCCGTGACGAGGACAAAGTGGTTTGGCTGTCGCAG 

ACCACCCTGTCCGTCGATGAGACCATGGAGATTGTCGGGCGGTTGCGTCGGCGTTTCCCCAAG 

CTGCAGGATCCGCCCAGCGACGACATCTGCTATGCGACCCAGAATCGGCAGGTCGCGGTCAAG 

GCGATGGCGCCCGAGTGCGAGCTGGTCATCGTGGTCGGCTCGCGCAATTCGTCGAATTCGGTT 

CGGCTGGTCGAGGTGGCGCTGGGTGCCGGGGCGCGGGCCGCCCACCTGGTGGACTGGGCCG 

ACGATATCGACTCGGCCTGGCTGGACGGCGTTACCACGGTCGGCGTTACGTCGGGGGCATCGG 

TCCCCGAGGTGCTGGTGCGCGGTGTGCTGGAGCGGCTGGCCGAATGCGGCTACGACATCGTG 

CAACCGGTGACAACGGCCAACGAGACGTTGGTGTTCGCATTGCCCCGGGAGCTCCGCTCACCT 

CGCTGA 

>Rv1216c - TB.seq 1359473:1360144 MW:24863 

>emb|AL123456|MTBH37RV:c1360144-1359470, Rv1216c SEQ ID NO:43 

ATGCACATTGGGCTGAAGATATTCATATGGGGCGTGTTAGGACTCGTCGTTTTCGGCGCGCTCC 

TATTCGGGCCAGCCGGCACGTTCGACTATTGGCAGGCGTGGGTGTTCCTCGCCGCATTTGTGA 

GCACCACGATTGGCCCCACAATCTATCTGGCTCGCAACGATCCCGCGGCCCTTCAACGTCGCAT 

GCGCAGCGGTCCGCTCGCGGAGGGCCGAACGATTCAGAAGTTCATCGTCATCGGCGCTTTTCT 

GGGGTTCTTCGCGATGATGGTGCTGAGCGCGTGCGAGCATCGTTATGGTTGGTCGTCAGTGCC 

AGCCGCGGTGTGCGTGATCGGCGACGTCCTAGTGATGACGGGCCTTGGCATCGCCATGCTGGT 

GGTCATCCAGAACAGGTATGCCGCCTCGACGGTCAGGGTGGAGGCGGGCCAGATATTGGCCTC 

CGACGGTCTCTACAAAATTGTCCGACACCCGATGTACGCCGGGAACGTGGTCATGATGACAGG 

CATACCGCTGGCACTGGGCTCTTACTGGGCGATGTTCATCCTCGTCCCCGGCACACTGGTGTTG 
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GTGTTCCGCATCCTCGACGAGGAAAAACTACTGACGCAAGAACTCAGCGGGTACCGCGAATACC 
GGCAACTGGTGCGCTACCGGTTGGTGCCCTACGTGTGGTAG 

>Rv1223 htrATB.seq 1365810:1367456 MW:56547 
5 >emb|AL123456|MTBH37RV:1365810-1367459, htrA SEQ ID NO:44 

GTGAGCCACTTGTCGCAGCGCATGGCGGGGTTGCTGCGAGTTCATGGCGAGTGGTCGCGATCC 
GTGGATACTAGGGTGGACACGGACAACGCGATGCCTGCACGTTTTAGCGCCCAGATTCAGAAT 
GAGGATGAGGTGACCTCCGACCAAGGCAACAACGGCGGCCCGAACGGCGGAGGCCGCCTGGC 
GCCGCGCCCGGTTTTTCGGCCACCGGTCGACCCGGCGTCGCGTCAAGCGTTCGGGCGTCCGT 

10 CCGGGGTCCAAGGGTCCTTTGTGGCCGAGCGTGTGCGCCCGCAGAAGTACCAGGACCAGTCT 
GACTTCACACCGAACGATCAGCTTGCTGACCCGGTGCTTCAGGAGGCGTTCGGTCGTCCGTTC 
GCGGGCGCCGAATCGCTGCAGCGCCATCCCATCGATGCCGGAGCGCTGGCAGCTGAGAAAGA 
CGGTGCCGGCCCCGACGAGCCCGACGATCCGTGGCGCGACCCCGCGGCCGCGGCCGCGCTG 
GGGACGCCAGCGCTAGCCGCGCCGGCACCGCACGGTGCGCTGGCCGGCAGCGGCAAGCTGG 

1 5 GTGTGCGCGACGTGCTGTTTGGCGGCAAGGTGTCCTACTTGGCGCTGGGCATCTTGGTCGCTA 
TCGCACTGGTGATCGGCGGCATCGGCGGTGTCATCGGCCGCAAGACCGCGGAAGTAGTCGAT 
GCGTTCACCACGTCGAAGGTGACCCTGTCGACCACTGGCAATGCCCAGGAACCGGCCGGCCG 
GTTCACCAAGGTGGCGGCCGCCGTGGCCGATTCGGTGGTGACCATTGAGTCGGTCAGCGACCA 
GGAGGGCATGCAAGGTTCCGGCGTCATCGTCGATGGCCGCGGCTACATCGTCACCAACAATCA 

20 CGTGATCTCTGAGGCGGCCAACAATCCCAGCCAGTTCAAGACGACCGTGGTGTTCAACGACGG 
CAAGGAGGTGCCCGCCAATCTGGTGGGTCGTGACCCCAAGACCGACTTGGCCGTCCTCAAGGT 
CGACAACGTCGACAATCTGACCGTGGCCCGGCTCGGTGATTCCAGCAAGGTACGGGTCGGTGA 
CGAAGTCCTCGCGGTCGGCGCGCCCCTGGGGCTGCGCAGTACGGTGACCCAGGGCATTGTCA 
GCGCGCTACACCGCCCCGTTCCGTTGTCGGGCGAGGGCTCTGACACCGACACCGTCATTGACG 

25 CAATTCAGACCGACGCCTCGATCAACCACGGTAACTCCGGCGGTCCGCTAATCGACATGGATGC 
CCAGGTGATTGGCATCAACACCGCCGGTAAGTCACTGTCGGATAGCGCCAGCGGGCTGGGCTT 
TGCGATCCCGGTCAACGAGATGAAATTGGTGGCAAATTCTCTGATCAAAGACGGAAAGATCGTG 
CATCCGACGTTGGGCATCAGCACCCGGTCAGTAAGCAACGCGATCGCGTCGGGCGCGCAGGT 
GGCCAATGTAAAGGCGGGAAGTCCCGCGCAGAAGGGCGGGATCTTGGAGAACGATGTGATCGT 

30 CAAGGTCGGTAACCGCGCGGTCGCCGACTCCGACGAGTTCGTCGTCGCCGTGCGCCAGTTGG 
CTATCGGCCAGGACGCTCCGATAGAGGTGGTCCGCGAGGGTCGGCATGTGACGCTGACGGTG 
AAACCGGACCCCGATAGCACCTAG 

>Rv1224 - TB.seq 1367461:1367853 MW:14083 
35 >emb|AL123456|MTBH37RV:1367461-1367856,Rv1224 SEQ ID NO:45 

GTGTTCGCCAACATCGGTTGGTGGGAAATGCTCGTCCTCGTCATGGTCGGGCTGGTGGTGCTT 
GGCCCGGAGCGGCTCCCGGGTGCCATCCGCTGGGCGGCAAGCGCTCTGCGGCAGGCGCGCG 
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ACTATCTCAGCGGTGTGACCAGCCAGCTACGTGAGGACATTGGACCCGAATTCGATGATCTGCG 
GGGACATCTCGGTGAGCTGCAGAAGCTACGGGGAATGACTCCGCGGGCTGCGTTGACCAAGCA 
CCTACTGGATGGCGATGATTCCCTGTTCACCGGAGACTTCGACCGACCGACGCCGAAGAAACC 
GGATGCGGCGGGCTCGGCGGGGCCGGACGCTACTGAGCAGATCGGTGCGGGGCCCATCCCG 
5 TTTGACAGCGATGCCACCTAG 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
>emb|AL123456|MTBH37RV:c1372947-1371775, mrp SEQ ID NO:46 

ATGCCAAGCCGCCTACACTCGGCGGTGATGTCCGGAACTCGTGATGGCGACCTGAACGCGGCG 

10 ATACGCACCGCGCTGGGCAAGGTAATCGACCCCGAATTGCGGCGCCCCATCACCGAACTGGGG 
ATGGTCAAAAGCATCGACACCGGCCCGGATGGGAGCGTGCACGTCGAGATCTACCTGACCATC 
GCCGGCTGCCCGAAGAAGTCCGAAATCACCGAGCGTGTCACCCGGGCGGTCGCCGACGTGCC 
AGGCACTTCGGCGGTGCGGGTCAGCTTGGACGTGATGAGCGACGAGCAGCGCACCGAGCTGC 
GTAAGCAGTTGCGTGGCGATACCCGCGAACCCGTCATCCCGTTCGCGCAACCCGATTCCTTGAC 

15 CCGGGTGTATGCCGTGGCTTCCGGTAAGGGCGGAGTCGGAAAGTCCACCGTCACGGTCAACCT 
GGCCGCCGCGATGGCCGTCCGCGGCCTGTCGATCGGGGTGCTGGACGCTGATATCCACGGCC 
ACTCTATCCCCCGGATGATGGGCACCACCGACCGGCCTACCCAGGTTGAGTCGATGATCCTGC 
CGCCGATCGCCCACCAGGTGAAGGTCATCTCGATAGCCCAGTTCACCCAGGGCAACACCCCGG 
TGGTGTGGCGCGGGCCGATGCTGCACCGGGCGTTGCAGCAGTTTCTGGCCGACGTGTACTGG 

20 GGGGATCTGGACGTGCTGCTGCTGGACTTGCCGCCCGGAACCGGCGACGTCGCCATCTCGGT 
GGCTCAACTGATCCCCAACGCCGAACTCCTGGTGGTCACCACCCCGCAGCTGGCCGCCGCGGA 
GGTGGCCGAACGGGCCGGCAGCATCGCGCTGCAAACCCGCCAACGCATCGTCGGCGTCGTGG 
AGAACATGTCGGGGCTCACGCTGCCGGACGGCACCACGATGCAGGTGTTCGGCGAGGGCGGT 
GGCCGGCTGGTCGCCGAGCGGTTGTCGCGTGCGGTCGGCGCCGACGTGCCGCTGCTGGGTCA 

25 GATCCCGCTGGACCCCGCACTGGTGGCCGCCGGCGATTCGGGCGTACCGCTCGTGTTGAGCT 
CGCCGGACTCGGCGATCGGCAAGGAACTGCATAGCATCGCCGACGGCTTGTCGACTCGACGAC 
GCGGATTGGCGGGCATGTCGCTGGGGTTGGACCCGACACGACGCTAG 



>Rv1239c corA magnesium and cobalt transport protein TB.seq 1381943:1383040 MW:41470 

30 >emb|AL123456|MTBH37RV:c1383040-1381940, corA SEQ ID NO:47 

GTGTTCCCAGGGTTTGACGCATTGCCCGAAGTGCTGCGACCGGTCGCGCGACCCCAGCCGCCG 
AACGCACACCCCGTTGCCCAGCCACCGGCCCAAGCCTTGGTCGACTGCGGTGTCTACGTCTGC 
GGCCAGCGACTGCCCGGCAAGTACACCTACGCCGCCGCGCTGCGCGAGGTGCGCGAGATCGA 
ACTGACCGGGCAGGAGGCGTTCGTCTGGATCGGGCTGCACGAGCCCGATGAAAACCAGATGCA 

35 GGACGTAGCAGACGTTTTCGGGTTGCACCCGTTAGCCGTTGAGGACGCCGTGCACGCGCACCA 
GCGACCCAAGTTGGAGCGCTACGACGAGACGCTGTTCCTCGTCCTCAAGACCGTCAACTACGT 
CCCGCACGAATCGGTGGTACTGGCCCGCGAGATCGTCAAAACCGGCGAGATCATGATCTTCGT 
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CGGCAAGGATTTCGTGGTCACCGTCCGCCACGGCGAACACGGCGGGTTATCCGAGGTGCGTAA 
GCGGATGGATGCCGACCCCGAACATTTGCGGTTGGGACCGTATGCGGTGATGCACGCGATCGC 
CGACTACGTGGTCGACCACTACCTCGAGGTGACCAATCTCATGGAGACCGATATCGACAGCATC 
GAGGAAGTAGCGTTCGCGCCGGGCCGCAAGCTCGACATCGAACCGATCTATCTGCTCAAGCGG 
5 GAAGTGGTCGAGTTGCGCCGGTGCGTGAATCCGCTATCGACCGCATTCCAGCGCATGCAGACC 
GAGAGCAAAGACCTCATTTCGAAAGAAGTGCGGCGCTACCTGCGCGACGTCGCCGACCACCAG 
ACCGAGGCCGCCGACCAGATCGCCAGCTACGACGACATGCTCAACTCGCTGGTGCAGGCCGC 
GCTCGCCCGGGTCGGCATGCAGCAAAACATGGACATGCGCAAGATATCCGCGTGGGCAGGTAT 
CATCGCGGTCCCCACCATGATCGCGGGCATCTATGGCATGAACTTTCACTTCATGCCCGAGCTG 
10 GACTCCAGGTGGGGTTACCCGACAGTGATCGGCGGGATGGTCCTTATCTGTCTGTTCCTCTACC 
ACGTCTTCCGCAACAGAAACTGGCTCTAG 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 

>emb|AL123456|MTBH37RV:1430060-1431646, Rv1279 SEQ ID NO:48 

15 ATGGACACTCAGAGCGACTACGTCGTGGTCGGTACCGGCTCAGCCGGGGCGGTTGTGGCCAG 
CCGGCTTAGCACCGATCCGGCCACGACGGTGGTGGCCCTGGAGGCGGGGCCGCGTGACAAGA 
ACAGATTCATCGGCGTCCCAGCGGCGTTTTCCAAGCTGTTCCGCAGCGAGATCGACTGGGATTA 
CCTAACCGAACCGCAGCCGGAGCTCGACGGCCGCGAAATCTATTGGCCTCGTGGCAAGGTGCT 
CGGTGGCTCGTCGTCCATGAACGCAATGATGTGGGTGCGTGGATTCGCATCAGACTACGATGA 

20 GTGGGCCGCGCGAGCCGGTCCGCGGTGGTCGTACGCCGACGTGCTCGGCTACTTTCGCCGCA 
TCGAGAACGTCACCGCTGCCTGGCACTTTGTCAGCGGTGACGACAGCGGAGTAACCGGTCCGT 
TGCATATTTCCCGGCAACGCAGCCCAAGATCGGTGACCGCAGCGTGGCTGGCAGCCGCACGTG 
AGTGCGGATTTGCCGCTGCGCGGCCGMTTCCCCTCGACCGGAAGGCTTTTGCGAGACCGTCG 
TCACCCAGCGCCGCGGTGCTCGATTCAGTACTGCCGACGCCTATCTGAAGCCCGCGATGCGCC 

25 GTAAAAACCTCCGTGTGCTTACCGGCGCCACTGCTACCCGGGTGGTCATCGACGGCGACCGGG 
CCGTCGGCGTGGAATACCAAAGCGACGGTCAAACCCGCATCGTCTACGCCCGCCGCGAGGTG 
GTGCTCTGCGCTGGTGCCGTCAACAGCCCTCAGCTGCTGATGCTCTCCGGCATCGGCGACCGC 
GACCACCTCGCCGAACACGACATCGACACCGTTTACCACGCGCCCGAGGTCGGGTGCAACCTG 
CTCGATCATCTCGTCACGGTGCTGGGTTTCGACGTCGAAAAGGACAGCTTGTTTGCCGCCGAGA 

30 AGCCCGGCCAGTTGATCAGCTACTTACTGCGACGCCGCGGCATGCTCACCTCCAACGTCGGCG 
AGGCGTACGGATTTGTCCGCAGCCGACCCGAACTGAAGCTGCCCGATTTGGAGTTGA l I I I IGC 
CCCGGCGCCGTTTTACGACGAAGCGCTGGTTCCACCGGCTGGTCACGGTGTGGTATTCGGCCC 
GATTCTGGTCGCGCCGCAAAGCCGTGGCCAGATCACGCTGCGGTCCGCCGATCCGCATGCCAA 
GCCTGTCATCGAACCGCGTTACCTGTCCGATCTCGGTGGCGTAGACCGGGCCGCCATGATGGC 

35 GGGCCTGCGGATATGCGCGCGGATCGCGCAGGCCCGCCCGCTCAGAGATCTCCTTGGGTCCA 
TCGCGCGACCGCGCAACAGCACCGAGCTGGACGAGGCCACTCTCGAGTTGGCGCTGGCCACT 
TGTTCGCACACCCTGTACCACCCGATGGGCACCTGCCGCATGGGCAGCGACGAGGCCAGCGT 
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GGTGGATCCGCAGCTGCGGGTCCGCGGTGTCGACGGACTCCGCGTCGCCGACGCGTCGGTGA 

TGCCCAGCACGGTTCGTGGGCATACGCATGCGCCGTCGGTGCTGATCGGGGAGAAGGCCGCC 

GACTTAATCCGCAGCTGA 

5 >Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 MW:45522 
>emb|AL123456|MTBH37RV:1449373-1450698, thrA SEQ ID NO:49 

gtgcccggtgacgaaaagccggtcggcgtagcggtactcggtttgggcaacgtcggcagcga 
ggttgtccgcatcatcgagaacagcgccgaggatctcgcggctcgtgtcggtgccccattggt 
cctgcggggcatcggcgtgcgccgcgtgacgaccgatcgcggcgtgccgatcgaattgttga 

10 ccgacgacattgaagagctcgtggcccgcgaggatgtcgatatcgtggtggaagtgatggggc 
cggtggaaccgtcgcgcaaggcgatcctgggcgcccttgagcgcggcaagtccgtcgttacg 
gcgaacMggctttactcgccacctccaccggcgaattggcacaggccgccgaaagcgcccat 
gttgatctgtatttcgaggcggccgtggcgggcgccattccggtcatccgtccgctcacccag 
tcgctggccggcgacacggtgctgcgagtggccgggatcgtcaacggcaccaccaactacatc 

15 ctctcggcgatggacagcaccggcgctgactatgccagcgccctggccgacgcaagtgcgct 
gggctatgcggaggctgatcccaccgcagacgtcgaaggctacgacgccgcggccaaggcag 
cgatcctggcatccattgccttccacacccgggtgaccgcagacgacgtgtatcgcgaaggca 
tcaccaaggtcactccggccgacttcggatccgcgcacgcgctgggttgcaccatcaaactgc 
tgtcgatctgtgagcgcataaccaccgacgaaggttcgcagcgggtatcggcccgcgtctatc 

20 CGGCCCTGGTACCTCTGTCGCATCCGCTTGCCGCGGTCAACGGCGCGTTCAATGCCGTGGTGG 
TCGAGGCCGAGGCCGCGGGCCGGCTGATGTTCTACGGCCAGGGCGCGGGCGGCGCGCCGAC 
CGCCTCTGCGGTGACCGGTGACCTAGTGATGGCCGCCCGCAACCGGGTACTCGGCAGCCGCG 
GCCCCCGTGAGTCTAAATACGCTCAACTTCCGGTGGCACCAATGGGTTTCATTGAAACGCGCTA 
TTACGTCAGCATGAACGTCGCCGACAAGCCGGGCGTCTTGTCCGCGGTGGCGGCGGAATTCGC 

25 CAAACGCGAGGTGAGCATCGCCGAGGTGCGCCAGGAGGGCGTTGTGGACGAAGGTGGTCGAC 
GGGTGGGAGCCCGAATCGTGGTGGTCACGCACCTCGCCACTGACGCCGCACTCTCGGAAACC 
GTTGATGCACTGGACGACTTGGATGTCGTGCAGGGTGTGTCCAGCGTGATACGACTGGAAGGA 
ACCGGCTTATGA 

30 >Rv1323 fadA4 acetyl-CoA C-acetyltransferase (aka thiL) TB.seq 1485860:1487026 MW:40049 
>emb|AL123456|MTBH37RV:1485860-1487029, fadA4 SEQ ID NO:50 

GTGATTGTTGCTGGCGCGCGTACACCCATCGGCAAGTTGATGGGCTCCCTGAAGGATTTCAGCG 
CCAGCGAGCTGGGTGCCATCGCCATTAAGGGCGCCCTGGAGAAGGCCAACGTGCCGGCGTCC 
TTGGTCGAGTACGTGATCATGGGCCAGGTGTTGACCGCGGGTGCCGGGCAAATGCCCGCACG 
35 GCAGGCGGCAGTGGCGGCCGGCATCGGTTGGGATGTCCCTGCGCTGACGATCAACAAGATGT 
GCCTGTCCGGCATCGACGCAATCGCGCTGGCTGATCAACTCATTCGGGCCAGAGAGTTCGACG 
TGGTGGTGGCCGGCGGTCAGGAGTCGATGACGAAGGCGCCCCACCTGTTGATGAATAGCCGGT 
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CGGGTTACAAGTACGGCGACGTTACGGTTTTGGACCACATGGCCTACGACGGTCTGCACGACG 
TGTTCACCGATCAGCCGATGGGCGCGCTCACCGAGCAACGCAACGACGTCGACATGTTCACCC 
GCTCCGAACAGGACGAGTACGCGGCTGCGTCCCACCAAAAGGCGGCCGCGGCATGGAAGGAC 
GGCGTATTCGCCGACGAGGTGATCCCGGTGAACATCCCGCAGCGCACGGGCGATCCACTGCA 
5 GTTCACCGAGGACGAGGGGATCCGCGCCAACACCACCGCCGCCGCGCTGGCCGGTCTGAAGC 
CGGCGTTCCGTGGCGACGGCACCATCACCGCCGGGTCGGCGTCACAGATCTCCGACGGTGCG 
GCCGCGGTGGTGGTCATGAACCAGGAAAAGGCCCAGGAACTGGGGCTGACCTGGCTAGCCGA 
GATCGGCGCCCACGGTGTGGTGGCCGGGCCGGATTCCACACTGCAATCGCAGCCGGCCAACG 
CGATCAACAAGGCGCTGGATCGCGAGGGCATCTCGGTGGACCAGCTCGACGTGGTGGAGATCA 
10 ACGAGGCGTTCGCTGCGGTGGCATTGGCCTCGATACGCGAACTCGGGCTGAACCCCCAGATCG 
TCAACGTCAACGGTGGTGCGATTGCCGTCGGGCATCCCCTCGGCATGTCAGGGACGCGAATCA 
CGCTACATGCGGCGCTGCAGTTGGCACGCCGGGGATCGGGCGTCGGGGTTGCCGCATTGTGC 
GGGGCTGGCGGGCAGGGCGACGCACTGATATTGCGGGCCGGATAG 

15 >Rv1389 gmk putative guanylate kinase TB.seq 1564399:1565022 MW:22064 
>emb|AL123456|MTBH37RV: 1564399-1 565025, gmk SEQ ID NO:51 

GTGAGCGTCGGCGAGGGACCGGACACCAAGCCCACCGCGCGTGGCCAACCGGCGGCAGTGG 

GACGTGTGGTGGTGCTGTCCGGTCCTTCCGCGGTCGGCAAATCCACGGTGGTTCGGTGTCTGC 

GCGAGCGGATCCCGAATCTGCATTTCAGTGTCTCGGCCACGACGCGGGCGCCACGCCCGGGC 

20 GAGGTCGACGGTGTCGACTACCACTTCATCGACCCCACCCGCTTTCAGCAGCTCATCGACCAG 
GGTGAGTTGCTGGAATGGGCAGAAATCCACGGCGGCCTGCACCGGTCGGGCACTTTGGCCCA 
GCCGGTGCGGGCGGCCGCGGCGACTGGTGTGCCGGTGCTTATCGAGGTTGACCTGGCCGGGG 
CCAGGGCGATCAAGAAGACGATGCCCGAGGCTGTCACCGTGTTTCTGGCGCCACCTAGCTGGC 
AGGATCTTCAGGCCAGACTGATTGGCCGCGGCACCGAAACAGCTGACGTTATCCAACGCCGCC 

25 TGGACACCGCGCGGATCGAATTGGCAGCGCAGGGCGACTTTGACAAGGTCGTGGTGAACAGGC 
GATTAGAGTCTGCGTGTGCGGAATTGGTATCCTTGCTGGTGGGAACGGCACCGGGCTCCCCGT 
GA 

>Rv1407 fmu similar to Fmu protein TB.seq 1583099:1584469 MW:48494 

30 >emb|AL123456|MTBH37RV: 1583099-1 584472, fmu SEQ ID NO:52 

ATGACCCCTAGATCGCGTGGGCCGCGCCGCCGGCCGCTGGACCCGGCGCGTCGTGCGGCCTT 
CGAGACGCTGCGGGCGGTTAGTGCGCGCGACGCCTACGCGAACCTGGTGTTGCCCGCGCTGC 
TGGCCCAACGCGGTATCGGCGGTCGCGACGCCGCGTTCGCCACCGAGCTGACATACGGCACC 
TGCCGAGCCCGCGGCCTGCTCGACGCGGTCATCGGTGCGGCCGCCGAGCGTTCGCCGCAGGC 

35 GATCGATCCGGTGCTGCTAGACCTGTTGCGGCTCGGCACCTACCAATTGCTGCGCACGCGGGT 
CGACGCACACGCCGCAGTGTCGACCACCGTCGAGCAGGCCGGAATCGAATTCGATTCGGCGC 
GAGCAGGTTTCGTCAACGGTGTACTACGAACGATCGCCGGCCGAGACGAGCGGTCCTGGGTTG 
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GCGAACTCGCTCCTGATGCGCAGAACGATCCGATCGGGCATGCCGCGTTCGTGCATGCGCATC 
CCCGATGGATCGCCCAGGCCTTTGCTGACGCGTTGGGCGCGGCGGTCGGGGAGCTCGAGGCA 
GTTTTGGCCAGCGACGACGAACGGCCAGCGGTGCACCTGGCGGCACGCCCCGGGGTGCTGAC 
CGCCGGCGAACTGGCCCGCGCGGTGCGCGGAACCGTCGGTCGGTATTCGCCGTTTGCGGTGT 
5 ATCTGCCGCGCGGTGACCCGGGGCGACTGGCGCCGGTGCGCGACGGCCAAGCGCTGGTCCA 
GGACGAGGGCAGCCAGTTAGTCGCCCGAGCATTGACCCTGGCGCCAGTCGACGGCGATACCG 
GACGGTGGCTGGACCTGTGTGCCGGACCGGGCGGCAAGACCGCGCTGTTGGCCGGGCTGGGT 
TTGCAGTGCGCAGCCCGGGTGACCGCGGTGGAACCCTCGCCACACCGCGCGGACCTGGTAGC 
ACAGAACACCCGCGGGCTGCCGGTTGAGCTCTTGCGTGTCGACGGGCGGCACACCGACCTCG 

10 ACCCGGGTTTCGACCGGGTGCTGGTGGATGCGCCCTGCACCGGGCTGGGCGCGTTACGCCGT 
CGGCCGGAGGCCCGTTGGCGTCGTCAGCCGGCGGACGTAGCGGCACTGGCCAAGCTACAACG 
CGAGTTGTTGAGCGCCGCCATCGCGCTGACTCGGCCCGGCGGTGTCGTGCTCTATGCCACATG 
CTCGCCGCACCTGGCCGAGACTGTGGGTGCTGTCGCCGACGCGCTACGCCGACATCCGGTTCA 
CGCGCTCGATACCCGCCCACTGTTCGAGCCGGTGATCGCGGGGCTGGGGGAGGGGCCCCACG 

15 TTCAGCTGTGGCCGCACCGGCACGGTACCGACGCCATGTTCGCCGCGGCGTTGCGCCGCCTG 
ACGTGA 

>Rv1409 ribG riboflavin biosynthesis TB.seq 1585192:1586208 MW:35367 
>emb|AL123456|MTBH37RV:1585192-1586211, ribG SEQ ID NO:53 

20 ATGAACGTGGAGCAGGTCAAGAGCATCGACGAGGCTATGGGTCTCGCCATCGAGCACTCCTAC 
CAGGTCAAAGGCACGACTTATCCAAAACCCCCAGTGGGGGCCGTCATTGTGGATCCCAACGGT 
CGGATCGTCGGCGCCGGCGGCACCGAGCCGGCCGGTGGCGATCATGCCGAGGTGGTGGCGC 
TGCGCCGGGCCGGCGGATTGGCTGCCGGCGCCATCGTGGTGGTCACCATGGAACCCTGTAAC 
CACTACGGCAAGACTCCGCCATGCGTGAACGCTCTGATCGAAGCCAGGGTGGGGACGGTGGTC 

25 TACGCCGTCGCCGACCCGAACGGGATCGCTGGGGGTGGCGCGGGCCGGCTGTCAGCAGCGG 
GCCTACAGGTGCGGTCCGGGGTGTTGGCTGAACAGGTGGCGGCCGGACCGCTGCGGGAGTGG 
CTCCACAAGCAACGCACCGGTCTGCCGCATGTCACCTGGAAGTACGCCACCAGCATCGACGGC 
CGCAGCGCCGCCGCCGACGGCTCCAGCCAGTGGATCTCCAGCGAGGCCGCACGCCTGGATCT 
GCATCGCCGCCGCGCCATCGCCGACGCGATCTTGGTCGGCACCGGCACCGTCCTCGCCGACG 

30 ACCCGGCCCTGACCGCGCGGCTGGCCGACGGCTCGCTGGCGCCGCAGCAGCCGCTGCGCGT 
GGTGGTGGGCAAGCGCGACATACCGCCGGAAGCACGGGTCCTCAACGACGAGGCACGCACCA 
TGATGATCCGCACCCACGAACCTATGGAGGTGCTCAGGGCGTTGTCGGATCGCACCGACGTGC 
TGCTGGAAGGAGGTCCCACCCTCGCCGGCGCCTTCCTACGAGCGGGTGCGATCAACCGGATCC 
TGGCCTACGTCGCACCGATCCTGTTGGGCGGTCCGGTTACCGCGGTCGATGACGTCGGGGTGT 

35 CCAACATCACCAACGCGTTGCGTTGGCAGTTCGACAGCGTCGAAAAGGTCGGACCGGATCTGTT 
GCTGAGCTTGGTGGCTCGTTAG 
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>Rv1440 secG TB.seq 1617715:1618065 MW12140 
>emb|AL123456|MTBH37RV:1617715-1618068 t secG SEQ ID NO:54 

GTGGCAGGCGTGACAGCCGCGGTCAGTGCACGCCTCAAAGCCGATGAGGCGCGACGGCCTGG 
GTTCTACGCGGCAGGCAGCGGTCCGCTGCCGCAGGTTCGGGGGAGTACGCTACCCGTCATGG 
5 AATTGGCCCTGCAGATCACGCTGATCGTCACGAGCGTGCTGGTGGTGTTGTTAGTACTGCTGCA 
CCGGGCCAAGGGTGGCGGGCTATCGACACTGTTCGGCGGTGGTGTGCAGTCAAGCCTGTCCG 
GCTCGACGGTGGTGGAGAAGAACCTGGACCGGTTGACGCTGTTCGTTACCGGCATCTGGCTGG 
TGTCCATCATCGGCGTGGCGTTGCTCATCAAATACCGCTAG 

10 >Rv1484 inhA TB.seq 1674200:1675006 MW:28529 

>emb|AL123456|MTBH37RV:1674200-1675009, inhA SEQ ID NO:55 

ATGACAGGACTGCTGGACGGCAAACGGATTCTGGTTAGCGGAATCATCACCGACTCGTCGATCG 

CGTTTCACATCGCACGGGTAGCCCAGGAGCAGGGCGCCCAGCTGGTGCTCACCGGGTTCGAC 

CGGCTGCGGCTGATTCAGCGCATCACCGACCGGCTGCCGGCAAAGGCCCCGCTGCTCGAACT 

15 CGACGTGCAAAACGAGGAGCACCTGGCCAGCTTGGCCGGCCGGGTGACCGAGGCGATCGGGG 
CGGGCAACAAGCTCGACGGGGTGGTGCATTCGATTGGGTTCATGCCGCAGACCGGGATGGGC 
ATCAACCCGTTCTTCGACGCGCCCTACGCGGATGTGTCCAAGGGCATCCACATCTCGGCGTATT 
CGTATGCTTCGATGGCCAAGGCGCTGCTGCCGATCATGAACCCCGGAGGTTCCATCGTCGGCA 
TGGACTTCGACCCGAGCCGGGCGATGCCGGCCTACAACTGGATGACGGTCGCCAAGAGCGCG 

20 TTGGAGTCGGTCAACAGGTTCGTGGCGCGCGAGGCCGGCAAGTACGGTGTGCGTTCGAATCTC 
GTTGCCGCAGGCCCTATCCGGACGCTGGCGATGAGTGCGATCGTCGGCGGTGCGCTCGGCGA 
GGAGGCCGGCGCCCAGATCCAGCTGCTCGAGGAGGGCTGGGATCAGCGCGCTCCGATCGGCT 
GGAACATGAAGGATGCGACGCCGGTCGCCAAGACGGTGTGCGCGCTGCTGTCTGACTGGCTG 
CCGGCGACCACGGGTGACATCATCTACGCCGACGGCGGCGCGCACACCCAATTGCTCTAG 

25 

>Rv1617 pykA pyruvate kinase TB.seq 1816187:1817602 MW:50668 
>emb|AL123456|MTBH37RV:1816187-1817605, pykA SEQ ID NO:56 

GTGACGAGACGCGGGAAAATCGTCTGCACTCTCGGGCCGGCCACCCAGCGGGACGACCTGGT 
CAGAGCGCTGGTCGAGGCCGGAATGGACGTCGCCCGAATGAACTTCAGCCACGGCGACTACGA 

30 CGATCACAAGGTCGCCTATGAGCGGGTCCGGGTAGCCTCCGACGCCACCGGGCGCGCGGTCG 
GCGTGCTCGCCGACCTGCAGGGCCCGAAGATCAGGTTGGGACGCTTCGCCTCCGGGGCCACC 
CACTGGGCCGAAGGCGAAACCGTCCGGATCACCGTGGGCGCCTGCGAGGGCAGCCACGATCG 
GGTGTCCACCACCTACAAGCGGCTAGCCCAGGACGCGGTGGCCGGTGACCGGGTGCTGGTCG 
ACGACGGCAAAGTCGCATTGGTGGTCGACGCCGTCGAGGGCGACGACGTGGTCTGCACCGTC 

35 GTCGAAGGCGGCCCGGTCAGCGACAACAAGGGCATCTCGTTGCCCGGAATGAACGTGACCGC 
GCCGGCCCTGTCGGAGAAGGACATCGAGGATCTCACGTTCGCGCTGAACCTCGGCGTCGACAT 
GGTGGCGCTTTCCTTCGTCCGCTCCCCGGCCGATGTCGAACTGGTCCACGAGGTGATGGATCG 
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GATCGGGCGACGGGTGCCGGTGATCGCCAAGCTGGAGAAGCCGGAAGCCATCGACAATCTCG 
AAGCGATCGTGCTGGCGTTCGACGCCGTCATGGTCGCTCGGGGCGACCTAGGTGTTGAGCTGC 
CGCTCGAAGAGGTCCCGCTGGTACAGAAGCGAGCCATCCAGATGGCCCGGGAGAACGCCAAG 
CCGGTCATTGTGGCGACCCAGATGCTCGACTCGATGATCGAGAACTCGCGGCCGACCCGAGCT 
5 GAGGCCTCCGACGTCGCCAACGCGGTGCTCGATGGCGCCGACGCGCTGATGCTGTCCGGGGA 
AACCTCGGTAGGGAAGTACCCCCTTGCTGCGGTCCGGACAATGTCGCGCATCATCTGCGCGGT 
CGAGGAGAACTCCACGGCCGCACCGCCGTTGACACACATTCCCCGGACCAAGCGTGGGGTCAT 
CTCGTATGCGGCCCGTGACATCGGCGAACGACTCGACGCCAAGGCCTTGGTGGCCTTCACTCA 
GTCCGGTGATACCGTGCGGCGACTGGCCCGCCTGCATACCCCGCTGCCGCTGCTGGCCTTCAC 
10 CGCGTGGCCCGAGGTGCGCAGCCAACTGGCGATGACCTGGGGCACCGAGACGTTCATCGTGC 
CGAAGATGCAGTCCACCGATGGCATGATCCGCCAGGTCGACAAATCGCTGCTCGAACTCGCCC 
GCTACAAGCGTGGTGACTTGGTGGTCATCGTCGCGGGTGCGCCGCCAGGCACAGTGGGTTCGA 
CCAACCTGATCCACGTGCACCGGATCGGGGAAGATGACGTCTAG 

15 >Rv1630 rpsA 30S ribosomal protein S1 TB.seq 1833540:1834982 MW:53203 
>emb|AL123456|MTBH37RV:1833540-1834985, rpsA SEQ ID NO:57 

ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCGAGGAC 
TTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCGAAGGCACCA 
TCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCGAAGGCGTGATCC 

20 CCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCGTTTCCGTCGGTGACG 
AGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGCTCATCCTCTCCAAGAAAC 
GCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCAAGGAGAAGGACGAGGCCGTC 
AAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCCTCGACATCGGGCTGCGCGGTTTC 
CTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCGACCTGCAGCCCTACATCGGCAAGGA 

25 GATCGAGGCCAAGATCATCGAGCTGGACAAGAACCGCAACAACGTGGTGCTGTCCCGTCGCGC 
CTGGCTGGAGCAGACCCAGTCCGAGGTGCGCAGCGAGTTCCTGAATAACTTGCAAAAAGGCAC 
CATCCGAAAGGGTGTCGTGTCCTCGATCGTCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGT 
GGACGGTCTGGTGCATGTCTCCGAGCTATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGT 
CCAGGTTGGTGACGAGGTCACCGTCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTC 

30 GTTGTCACTCAAGGCGACTCAGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGG 
GCAGATCGTGCCGGGCAAGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGA 
GGGTATCGAGGGCCTGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATC 
AGGTGGTTGCCGTCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTC 
GGATCTCGTTGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGT 

35 ACGGCATGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCG 
AAACCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCG 
AGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGGCG 
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GCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCGCGGG 

TGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCAGCGCTT 

GA 

5 >Rv1631 -TB.seq 1835011:1836231 MW:44669 

>emb|AL123456|MTBH37RV:1835011-1836234, Rv1631 SEQ ID NO:58 

ATGCTGCGCATCGGGCTGACCGGCGGCATTGGCGCCGGGAAGTCGTTGCTGTCCACGACGTTC 
TCGCAATGCGGCGGAATCGTTGTCGACGGCGATGTGTTGGCGCGTGAAGTGGTCCAGCCGGGC 
ACCGAGGGGCTGGCCTCGCTGGTCGACGCGTTCGGTCGCGACATCCTGCTTGCAGACGGAGC 

10 GCTGGACCGGCAGGCGTTGGCGGCCAAGGCGTTTCGAGATGACGAGTCGCGCGGTGTGCTCA 
ACGGAATCGTGCACCCGCTGGTCGCCCGGCGCCGATCCGAGATCATCGCGGCGGTTTCGGGG 
GACGCGGTTGTGGTCGAAGATATTCCACTGCTGGTGGAATCCGGGATGGCGCCATTGTTTCCGC 
TGGTGGTGGTGGTGCACGCCGACGTCGAGCTACGGGTGCGACGGCTGGTCGAGCAACGCGGC 
ATGGCCGAAGCCGACGCCCGGGCTAGGATCGCTGCGCAGGCCAGCGACCAGCAGCGTCGTGC 

15 CGTCGCCGACGTCTGGCTGGACAACTCGGGCAGCCCAGAGGATTTGGTGCGGCGGGCCCGCG 
ACGTCTGGAACACGCGCGTCCAGCCCTTCGCGCACAACCTGGCCCAACGTCAGATTGCGCGCG 
CGCCGGCTAGGTTGGTGCCGGCGGATCCAAGCTGGCCGGATCAGGCGCGGCGCATCGTCAAC 
CGGCTAAAGATCGCGTGCGGGCATAAGGCCTTGCGAGTTGACCACATTGGGTCAACCGCCGTG 
TCGGGCTTCCCCGATTTTCTAGCCAAGGATGTCATCGACATCCAGGTCACCGTCGAATCACTTG 

20 ACGTGGCCGACGAGCTGGCCGAGCCCTTGCTGGCCGCCGGCTACCCACGCCTCGAGCACATC 
ACCCAGGACACCGAAAAGACCGACGCTCGCAGCACCGTCGGCCGCTACGACCACACCGACAGT 
GCCGCTCTGTGGCACAAGCGCGTGCACGCCTCGGCGGATCCCGGTCGGCCGACCAACGTGCA 
CCTGCGGGTGCACGGCTGGCCCAACCAACAGTTCGCCCTGCTGTTCGTCGACTGGCTGGCGGC 
CAATCCCGGCGCGAGAGAAGACTATTTGACGGTCAAGTGTGACGCCGACAGGCGCGCCGACG 

25 GTGAGCTCGCGCGCTACGTCACCGCCAAGGAGCCGTGGTTCCTGGATGCCTACCAGCGGGCAT 
GGGAGTGGGCGGATGCGGTGCACTGGCGTCCCTGA 

>Rv1706c - TB.seq 1932695:1933876 MW:39779 
>emb|AL123456|MTBH37RV:c1 933876-1 932692, PPE SEQ ID NO:59 

30 ATGACCCTCGATGTCCCGGTCAACCAGGGGCATGTCCCCCCGGGCAGCGTCGCCTGCTGCCTT 
GTTGGGGTCACCGCCGTTGCTGACGGCATCGCCGGGCATTCCCTGTCCAACTTTGGGGCGTTA 
CCTCCCGAGATCAATTCGGGTCGTATGTATAGCGGTCCGGGATCCGGGCCACTGATGGCTGCC 
GCGGCGGCCTGGGACGGGCTGGCCGCAGAGTTGTCGTCGGCAGCGACTGGCTACGGTGCGG 
CGATCTCGGAGCTGACAAACATGCGGTGGTGGTCGGGGCCGGCATCGGATTCGATGGTGGCC 

35 GCCGTCCTGCCCTTTGTCGGCTGGCTGAGTACCACCGCGACGCTAGCCGAACAGGCCGCGATG 
CAGGCTAGGGCGGCCGCAGCGGCCTTTGAAGCCGCCTTCGCCATGACGGTGCCCCCGCCGGC 
GATCGCGGCCAACCGGACCTTGTTGATGACGCTCGTCGATACCAACTGGTTCGGGCAAAACAC 
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GCCGGCGATCGCCACCACCGAGTCCCAATACGCCGAGATGTGGGCCCAAGACGCCGCCGCGA 
TGTACGGCTATGCCAGCGCCGCGGCACCCGCCACGGTTTTGACTCCGTTCGCACCACCGCCGC 
AAACCACCAACGCGACCGGCCTCGTCGGCCACGCAACAGCGGTGGCCGCGCTGCGGGGGCAG 
CACAGCTGGGCCGCGGCGATTCCATGGAGCGACATACAGAAATACTGGATGATGTTCCTGGGC 
5 GCCCTCGCCACTGCCGAAGGGTTCATTTACGACAGCGGTGGGTTAACGCTGAATGCTCTGCAGT 
TCGTCGGCGGGATGTTGTGGAGCACCGCATTGGCAGAAGCCGGTGCGGCCGAGGCAGCGGCC 
GGCGCGGGTGGAGCCGCTGGATGGTCGGCGTGGTCGCAGCTGGGAGCTGGACCGGTGGCGG 
CGAGCGCGACTCTGGCCGCCAAGATCGGACCGATGTCGGTGCCGCCGGGCTGGTCCGCACCG 
CCCGCCACGCCCCAGGCGCAAACCGTCGCGCGATCGATTCCCGGTATTCGCAGCGCCGCCGA 
10 GGCGGCTGAAACATCGGTCCTACTCCGGGGGGCACCGACTCCGGGCAGGAGTCGCGCCGCCC 
ATATGGGACGCCGATATGGAAGACGACTCACCGTGATGGCTGACCGGCCGAACGTCGGATAG 

>Rv1 745c - similar to Q46822 ORFJD1 82 TB.seq 1 971 381 : 1 971 989 MW:22490 
>emb|AL123456|MTBH37RV:c1 971 989-1971 378, Rv1745c SEQ ID NO:60 

15 ATGACCCGCAGCTACCGGCCAGCTCCACCGATCGAGCGGGTGGTTTTGCTCAACGACCGCGGC 
GACGCGACAGGTGTGGCCGACAAGGCCACCGTGCACACCGGCGACACCCCTTTGCACCTCGC 
GTTCTCCAGCTATGTGTTCGATCTGCACGATCAGCTGTTGATCACGCGGCGGGCCGCCACCAAG 
AGGACGTGGCCGGCGGTATGGACCAACAGTTGCTGCGGGCACCCCCTGCCTGGCGAATCGCT 
ACCCGGCGCCATACGCCGGCGGCTCGCTGCCGAACTCGGACTGACCCCAGATCGGGTCGATC 

20 TGATCCTGCCGGGGTTCCGCTACCGGGCCGCTATGGCCGATGGCACCGTGGAAAACGAGATCT 
% GCCCCGTCTACCGAGTCCAGGTTGACCAACAGCCCCGGCCGAACTCGGACGAGGTCGACGCG 
ATCCGCTGGTTGTCCTGGGAACAATTCGTGCGCGATGTTACCGCCGGCGTAATCGCCCCGGTAT 
CCCCTTGGTGCCGCTCACAACTGGGCTACCTGACCAAACTTGGACCATGTCCGGCACAGTGGC 
CCGTGGCCGACGACTGCCGGCTACCGAAAGCCGCACATGGTAATTAA 

25 

>Rv1800 - TB.seq 2039451:2041415 MW:67068 
>emb|AL123456|MTBH37RV:2039451 -204141 8, PPE SEQ ID NO:61 

ATGCTGCCGAATTTCGCGGTGCTGCCCCCCGAGGTCAATTCGGCGAGGGTGTTCGCCGGTGCG 
GGGTCGGCGCCGATGTTAGCGGCAGCGGCCGCCTGGGATGATCTAGCCTCCGAGCTGCATTGT 

30 GCTGCAATGTCATTCGGGTCGGTTACGTCGGGATTGGTGGTTGGGTGGTGGCAGGGATCGGCG 
TCGGCGGCGATGGTGGACGCAGCCGCGTCGTACATCGGGTGGCTGAGCACGTCGGCTGCCCA 
CGCCGAGGGCGCGGCCGGTCTGGCTCGGGCCGCGGTATCGGTGTTCGAGGAGGCGCTGGCC 
GCGACGGTGCATCCGGCGATGGTTGCGGCAAATCGCGCCCAGGTGGCGTCGCTGGTAGCGTC 
GAACTTGTTTGGGCAGAACGCGCCTGCGATCGCCGCGCTCGAATCCTTGTATGAGTGTATGTGG 

35 GCCCAGGATGCAGCGGCCATGGCGGGTTATTACGTTGGGGCTTCGGCGGTGGCCACACAGTTG 
GCATCGTGGCTGCAACGGCTACAGAGCATCCCCGGCGCCGCCAGTCTTGATGCCCGTCTGCCG 
AGCTCGGCCGAGGCACCGATGGGAGTCGTCCGCGCGGTCAACAGCGCGATCGCCGCCAATGC 
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GGCTGCGGCACAAACCGTTGGCCTGGTCATGGGAGGCAGCGGCACGCCAATACCGTCGGCCA 
GATATGTCGAGCTCGCGAACGCGCTGTACATGAGTGGCAGCGTCCCGGGTGTTATCGCGCAGG 
CGCTCTTCACGCCCCAAGGGCTCTACCCGGTGGTCGTGATCAAGAACCTCACTTTCGATTCCTC 
GGTGGCGCAGGGTGCCGTCATTCTCGAAAGTGCGATTCGGCAGCAAATTGCCGCCGGCAACAA 
5 CGTCACCGTCTTCGGCTACTCGCAGAGCGCCACGATCTCGTCACTAGTGATGGCCAATCTTGCG 
GCTTCGGCCGACCCGCCGTCTCCAGACGAGCTTTCCTTCACGCTGATCGGCAATCCCAACAACC 
CCAATGGCGGGGTTGCCACCAGGTTCCCGGGGATCTCCTTTCCAAGCTTGGGCGTGACGGCCA 
CCGGGGCCACTCCGCACAATCTGTACCCGACCAAGATCTACACCATCGAATACGACGGCGTCG 
CCGACTTTCCGCGGTACCCGCTCAACTTTGTGTCGACCCTCAACGCCATTGCCGGCACCTACTA 

10 CGTGCACTCCAACTACTTCATCCTGACGCCGGAACAAATTGACGCAGCGGTTCCGCTGACCAAT 
ACGGTCGGTCCCACGATGACCCAGTACTACATCATTCGCACGGAGAACCTGCCGCTGCTAGAG 
CCACTGCGATCGGTGCCGATCGTGGGGAACCCACTGGCGAACCTGGTTCAACCAAACTTGAAG 
GTGATTGTTAACCTGGGCTACGGCGACCCGGCCTATGGTTATTCGACCTCGCCGCCCAATGTTG 
CGACTCCGTTCGGGTTGTTCCCAGAGGTCAGCCCGGTCGTCATCGCCGACGCTCTCGTCGCCG 

15 GGACCCAGCAGGGAATCGGCGATTTCGCCTACGACGTCAGCCACCTCGAACTGCCGTTGCCGG 
CAGACGGGTCGACGATGCCAAGCACCGCACCGGGCTCGGGTACGCCGGTCCCCCCGCTCTCG 
ATCGACAGCCTGATAGACGACCTGCAGGTGGCTAACCGCAACCTCGCCAACACGATTTCGAAG 
GTGGCCGCGACGAGCTACGCGACGGTGCTCCCAACCGCCGACATCGCCAATGCGGCGTTGAC 
GATCGTGCCGTCGTACAACATCCACCTTTTTTTGGAGGGCATCCAGCAAGCGCTCAAGGGCGAC 

20 CCGATGGGACTCGTCAACGCGGTCGGATACCCACTCGCGGCCGACGTGGCACTGTTCACGGCC 
GCAGGCGGTCTTCAGCTCTTGATCATCATCAGCGCGGGCCGAACGATTGCCAATGACATCTCGG 
CCATTGTCCCCTGA 

>Rv1844c gnd 6-phosphogluconate dehydrogenase (Gram -) TB.seq 2093732:2095186 
25 MW:51548 >emb|AL123456|MTBH37RV:c20951 86-2093729, gnd SEQ ID NO:62 

ATGAGTTCGTCGGAATCGCCAGCCGGCATCGCGCAGATCGGCGTCACTGGCCTGGCCGTGATG 
GGTTCCAACATCGCCCGAAACTTCGCCCGGCACGGCTACACCGTGGCAGTGCACAATCGGTCG 
GTCGCCAAGACCGACGCGCTGCTTAAGGAGCACAGCTCAGACGGCAAGTTCGTGCGCAGTGAA 
ACGATCCCCGAATTTCTTGCCGCACTGGAAAAACCGCGTCGGGTGCTGATCATGGTCAAGGCC 
30 GGAGAGGCCACTGACGCTGACGCTGTCATCAACGAACTTGCTGACGCCATGGAACCCGGCGAC 
ATCATCATCGACGGCGGCAATGCGTTGTACACCGACACCATGCGCCGCGAGAAAGCGATGCGT 
GAGCGGGGCTTGCACTTCGTCGGGGCCGGGATCTCCGGCGGCGAAGAGGGCGCGTTGAACGG 
GCCGTCGATCATGCCCGGCGGACCCGCCGAGTCATACCAATCGCTGGGTCCGCTGCTCGAGGA 
GATCTCCGCGCATGTCGACGGCGTGCCGTGCTGCACCCACATTGGCCCGGACGGCTCCGGGC 
35 ACTTCGTCAAGATGGTCCACAACGGCATCGAGTACTCCGACATGCAGCTCATCGGTGAGGCCTA 
CCAGCTGATGCGCGACGGGCTAGGTCTGACCGCGCCGGCGATCGCCGATGTGTTCACCGAGT 
GGAACAATGGCGATCTGGACAGCTACCTGGTCGAGATCACCGCCGAGGTGCTGCGGCAGACCG 
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ATGCCAAGACCGGCAAACCGCTCGTCGACGTCATCGTGGACCGGGCCGAGCAGAAAGGCACC 

GGCCGTTGGACCGTCAAGTCCGCGCTGGACCTGGGTGTGCCGGTGACCGGCATCGCCGAAGC 

GGTGTTTGCCCGCGCTCTCTCGGGATCCGTGGGGCAACGCTCGGCCGCCAGCGGTCTGGCTTC 

GGGCAAGCTCGGCGAGCAGCCCGCCGACCCCGCCACGTTCACCGAAGACGTCCGCCAGGCGT 

TGTACGCCTCCAAGATCGTGGCCTACGCTCAGGGCTTCAACCAGATCCAGGCCGGCAGCGCCG 

AATTCGGCTGGGACATCACGCCGGGCGACCTGGCCACCATCTGGCGTGGCGGCTGCATCATCC 

GGGCGAAGTTCCTCAACCACATCAAGGAAGCCTTTGACGCCAGCCCGAACCTGGCCAGTCTGA 

TTGTGGCCCCGTATTTCCGCGGCGCCGTCGAATCGGCGATCGACAGTTGGCGGCGTGTGGTGT 

C(3ACGGCGGCCCAACTGGGTATCCCGACCCCGGGATTCTCGTCGGCCCTGTCGTATTACGACG 

CGCTGCGCACCGCGCGGCTGCCCGCTGCACTCACCCAGGCCCAGCGCGACTTCTTCGGCGCA 

CACACCTACGGCCGGATCGACGAACCAGGCAAGTTCCACACACTATGGAGTTCAGACCGCACC 

GAAGTACCGGTGTAG 

>Rv1900c HpJ TB.seq 2146246:2147631 MW:49685 
>emb|AL123456|MTBH37RV:c2147631-2146243, lipJ SEQ ID NO:63 

GTGGCGCAGGCTCCCCACATTCACAGGACCCGCTACGCAAAATGCGGCGACATGGATATCGCC 

TACCAGGTGCTGGGTGACGGTCCGACGGATCTGCTGGTGTTGCCGGGGCCGTTCGTGCCGATC 

GACTCGATCGACGACGAGCCATCGCTGTACCGTTTCCATCGCCGTCTTGCGTCATTCAGCAGGG 

TGATCCGCCTCGACCATCGTGGGGTCGGCCTGTCGTCACGGCTCGCCGCGATAACCACGCTGG 

GGCCGAAGTTCTGGGCCCAGGACGCGATCGCGGTGATGGACGCGGTCGGATGCGAGCAGGCG 

ACAATTTTCGCGCCCAGTTTCCACGCCATGAACGGACTTGTTCTCGCCGCCGACTACCCCGAGC 

GGGTGCGCAGCCTGATCGTCGTCAACGGCTCGGCGCGCCCACTATGGGCGCCCGACTACCCG 

GTAGGCGCCCAGGTTCGTCGAGCTGACCCGTTCCTGACGGTGGCGCTGGAACCGGATGCCGTC 

GAGCGGGGCTTCGACGTGCTGAGCATCGTGGCTCCTACCGTGGCCGGAGATGACGTGTTTCGA 

GCCTGGTGGGATCTCGCCGGCAACCGTGCCGGACCGCCGAGCATTGCCCGTGCCGTTTCAAAG 

GTCATAGCCGAGGCCGACGTACGAGATGTCTTGGGACACATCGAGGCTCCAACACTGATCTTGC 

ACCGTGTCGGATCGACGTACATCCCGGTGGGACATGGTCGCTACCTCGCCGAGCACATCGCTG 

GATCCCGCTTGGTCGAACTACCCGGCACCGATACCCTGTACTGGGTTGGCGACACCGGGCCGA 

TGCTCGATGAAATCGAGGAATTCATCACCGGCGTGCGCGGCGGCGCTGACGCCGAGCGCATGC 

TTGCCACCATCATGTTTACCGACATCGTCGGCTCGACCCAGCACGCCGCCGCGCTCGGCGACG 

ACCGATGGCGCGACCTGTTGGACAACCACGACACCATCGTGTGCCACGAAATCCAGCGGTTCG 

GCGGTCGCGAAGTGAACACGGCCGGTGACGGTTTCGTCGCGACGTTCACCAGTCCGAGTGCC 

GCGATCGCGTGCGCGGACGACATCGTCGACGCGGTCGCCGCGCTGGGTATTGAGGTCCGGAT 

CGGTATTCATGCGGGCGAGGTCGAGGTGCGCGATGCCTCGCACGGTACCGACGTCGCCGGCG 

TGGCCGTGCATATCGGTGCGCGCGTCTGCGCGCTGGCCGGACCCAGTGAGGTGCTGGTGTCC 

TCGACCGTGCGAGACATCGTCGCCGGATCACGGCACCGGTTCGCCGAGCGTGGTGAGCAGGA 
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ACTCAAGGGCGTACCGGGCAGATGGCGGCTATGCGTGCTCATGCGCGACGACGCCACCCGCA 
CGCGCTAA 

>Rv1967 -TB.seq 2210599:2211624 MW:36516 
5 >emb|AL123456|MTBH37RV:2210599-2211627,Rv1967 SEQ ID NO:64 

ATGAGGGAGAACCTGGGGGGCGTCGTGGTGCGCCTCGGCGTCTTCCTGGCGGTATGCCTGCT 
GACGGCGTTCCTGCTGATTGCCGTCTTCGGGGAGGTGCGCTTCGGCGACGGCAAGACCTACTA 
CGCCGAGTTCGCCAACGTGTCCAATCTGCGAACGGGCAAGCTGGTGCGCATCGCCGGCGTCGA 
GGTCGGCAAGGTCACCAGGATCTCCATCAACCCCGACGCGACGGTGCGGGTGCAGTTCACCGC 

10 CGACAACTCGGTCACCCTCACGCGGGGCACCCGGGCGGTGATCCGCTACGACAACCTGTTCGG 
TGACCGCTATTTGGCGCTGGAGGAAGGGGCCGGCGGACTCGCCGTTCTTCGTCCCGGTCACAC 
GATTCCGTTGGCGCGCACCCAACCGGCGTTGGATCTGGATGCCCTGATCGGTGGATTCAAGCC 
GCTGTTTCGTGCGCTGAACCCCGAGCAGGTCAACGCGCTGAGCGAACAGTTGCTGCACGCGTT 
TGCCGGACAGGGGCCCACGATCGGGTCATTGCTGGCCCAGTCCGCGGCCGTGACCAACACCC 

15 TGGCCGACCGTGATCGGCTGATCGGGCAGGTGATCACCAACCTCAACGTGGTGCTGGGCTCGC 
TGGGCGCTCACACCGATCGGTTGGACCAGGCGGTGACGTCGCTATCAGCGTTGATTCACCGGC 
TCGCGCAACGCAAGACCGACATCTCCAACGCCGTGGCCTACACCAACGCCGCCGCCGGCTCG 
GTCGCCGATCTGCTGTCGCAGGCTCGCGCGCCGTTGGCGAAGGTGGTTCGCGAGACCGATCG 
GGTGGCCGGCATCGCGGCCGCCGACCACGACTACCTCGACAATCTGCTCAACACGCTGCCGGA 

20 CAAATACCAGGCGCTGGTCCGCCAGGGTATGTACGGCGACTTCTTCGCCTTCTACCTGTGCGAC 
GTCGTGCTCAAGGTCAACGGCAAGGGCGGCCAGCCGGTGTACATCAAGCTGGCCGGTCAGGA 
CAGCGGGCGGTGCGCGCCGAAATGA 

>Rv1975 - TB.seq 2218050:2218712 MW:23650 
25 >emb|AL123456|MTBH37RV:2218050-2218715, Rv1975 SEQ ID NO:65 

ATGTCGCGTCGAGCATCGGCCACGTGTGCCTTGTCCGCGACCACCGCCGTCGCCATAATGGCT 
GCTCCCGCCGCACGGGCCGACGACAAGCGGCTCAACGACGGCGTGGTCGCCAACGTCTACAC 
CGTTCAACGTCAGGCCGGCTGCACCAACGACGTCACGATCAACCCGCAACTACAATTGGCCGC 
CCAATGGCACACCCTCGATCTGCTGAACAACCGGCACCTCAACGACGACACCGGTTCTGACGG 
30 ATCCACACCGCAAGACCGCGCGCATGCCGCCGGCTTCCGCGGGAAAGTCGCTGAAACCGTGG 
CGATCAATCCCGCCGTAGCGATCAGCGGCATCGAGTTGATAAACCAGTGGTACTACAACCCCGC 
GTTTTTCGCGATCATGTCCGACTGCGCCAACACCCAGATCGGGGTGTGGTCAGAAAACAGCCC 
GGATCGCACCGTCGTGGTGGCCGTTTACGGACAGCCCGATCGACCTTCCGCGATGCCGCCCAG 
GGGAGCGGTAACCGGACCGCCGTCCCCGGTGGCCGCGCAAGAGAACGTTCCTATCGACCCCA 
35 GCCCCGACTACGACGCCAGCGACGAGATCGAATACGGCATCAACTGGCTGCCATGGATCCTGC 
GCGGCGTGTACCCGCCGCCCGCAATGCCGCCGCAGTAG 
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>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW:36591 
>emb|AL123456|MTBH37RV:c22251 86-222421 8, nrdF SEQ ID NO:66 

ATGACCGGCAAGCTCGTTGAGCGGGTGCACGCAATCAATTGGAACCGGTTGCTCGATGCTAAA 
GATTTGCAGGTCTGGGAACGTTTGACCGGTMCTTTTGGTTGCCGGAAAAGATTCCGCTCTCCA 
5 ACGACCTGGCATCTTGGCAAACGTTGAGTTCCACCGAGCAGCAGACGACGATCCGGGTGTTCA 
CCGGCTTGACCCTGCTCGACACCGCGCAGGCGACGGTGGGAGCAGTGGCCATGATCGACGAC 
GCGGTCACCCCCCACGAAGAGGCGGTCCTGACCAACATGGCGTTCATGGAGTCAGTGCACGCC 
AAGAGCTACAGCTCGATCTTCTCGACCCTGTGCTCGACCAAGCAGATCGACGATGCCTTCGACT 
GGTCGGAACAGAACCCTTACCTGCAGCGAAAAGCGCAGATCATCGTCGACTACTACCGCGGTG 

10 ACGACGCGCTCAAGCGCAAAGCATCGTCGGTAATGCTGGAGTCCTTCCTGTTCTACTCCGGCTT 
CTACCTGCCCATGTACTGGTCGTCGCGGGGTAAGCTCACCAACACCGCCGATCTGATCCGGCT 
GATCATCCGAGATGAAGCCGTCCACGGCTACTACATCGGCTACAAATGTCAACGAGGTTTGGCC 
GACCTGACCGACGCCGAGCGGGCCGACCACCGCGAATACACCTGCGAGCTGCTGCACACGCT 
CTACGCGAACGAGATCGACTATGCGCACGACTTGTACGACGAGTTGGGCTGGACCGACGACGT 

15 TTTGCCCTACATGCGTTACAACGCCAACAAGGCGCTAGCCAACCTGGGATACCAGCCTGCATTC 
GATCGTGACACCTGCCAGGTGAACCCGGCCGTGCGCGCAGCTCTCGACCCCGGTGCAGGGGA 
GAACCACGACTTTTTCTCCGGCTCCGGAAGCTCATACGTAATGGGCACCCACCAACCCACCACC 
GACACCGACTGGGACTTCTAA 

20 >Rv2092c helY helicase, Ski2 subfamily TB.seq 2349335:2352052 MW:99576 
>emb|AL123456|MTBH37RV:c2352052-2349332, helY SEQ ID NO:67 

GTGACTGAGCTGGCCGAGCTGGACCGGTTCACCGCGGAACTACCGTTCTCGCTCGACGACTTT 
CAGCAGCGGGCTTGCAGCGCGCTGGAACGCGGCCACGGTGTGCTGGTGTGCGCGCCGACCG 
GCGCTGGCAAGACGGTGGTCGGCGAGTTCGCCGTGCACCTGGCGCTGGCGGCCGGCAGTAAA 

25 TGTTTCTACACCACGCCGCTGAAAGCCCTGAGCAACCAAAAGCACACCGATCTCACAGCACGCT 
ACGGCCGTGACCAGATCGGGCTGCTGACCGGTGACCTGTCGGTCAACGGCAACGCGCCGGTG 
GTGGTGATGACCACCGAAGTGCTGCGCAACATGCTCTACGCGGATTCGCCTGCGCTGCAGGGG 
CTTTCCTATGTGGTGATGGATGAGGTGCATTTCCTCGCCGACCGGATGCGGGGTCCGGTGTGG 
GAGGAGGTGATCCTGCAACTGCCCGACGACGTGCGGGTGGTCAGCCTGTCGGCGACGGTGAG 

30 CAACGCCGAGGAGTTCGGCGGTTGGATCCAGACGGTGCGGGGCGACACCACGGTGGTGGTCG 
ACGAGCATCGGCCGGTGCCGTTGTGGCAACACGTCTTGGTGGGCAAGCGCATGTTCGACCTGT 
TCGATTACCGGATCGGCGAAGCCGAAGGGCAGCCCCAAGTCAACCGCGAGTTGCTGCGCCACA 
TCGCGCATCGCCGTGAGGCCGACCGGATGGCCGATTGGCAGCCTCGGCGCCGAGGCTCGGGC 
CGGCCCGGCTTCTACCGGCCACCCGGCCGACCCGAGGTGATCGCCAAACTCGACGCTGAAGG 

35 GCTGTTGCCGGCGATCACCTTCGTGTTCTCCCGGGCCGGTTGTGACGCCGCGGTCACCCAATG 
CCTGCGGTCACCGCTGCGGTTGACCAGCGAAGAGGAGCGCGCACGGATCGCCGAGGTGATCG 
ACCACCGCTGCGGTGACCTGGCCGACTCCGACCTGGCGGTACTCGGCTACTACGAATGGCGG 
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GAAGGGTTACTGCGCGGTCTGG(XGCCCACCACGCGGGCATGTTGCCGGCCTTCCGGCACAC 
GGTGGAGGAGCTGTTCACCGCCGGTTTGGTCAAGGCTGTATTCGCCACCGAGACTCTGGCGCT 
CGGTATCAACATGCCGGCCCGCACGGTGGTGCTGGAGCGGCTGGTGAAGTTCAACGGTGAGCA 
GCACATGCCGCTGACGCCGGGGGAGTACACCCAACTGACCGGTCGCGCCGGCCGGCGCGGTA 
5 TCGACGTCGAGGGTCACGCGGTGGTGATCTGGCACCCGGAAATTGAACCGTCCGAGGTGGCG 
GGCCTGGCCTCCACCCGCACCTTTCCGCTGCGCAGCTCGTTTGCCCCGTCGTACAACATGACG 
ATCAACCTGGTGCACCGGATGGGTCCGCAACAGGCGCACCGACTGCTCGAGCAGTCGTTCGCC 
CAATATCAGGCCGACCGATCCGTGGTCGGACTGGTCCGCGGAATTGAGCGGGGCAACAGGATA 
CTCGGCGAGATCGCAGCCGAACTGGGCGGATCTGATGCGCCCATCCTCGAATACGCTCGATTG 

10 CGCGCGCGGGTGTCCGAGCTGGAACGTGCGCAGGCCCGCGCGTCGCGGTTACAGCGACGGC 
AGGCGGCCACCGATGCGCTGGCCGCGCTGCGCCGCGGTGACATCATCACCATCACCCACGGC 
CGCCGCGGTGGTCTGGCCGTCGTCCTGGAATCAGCCCGCGACCGCGACGACCCGCGTCCGCT 
GGTGCTAACCGAACACCGATGGGCGGGACGGATCTCCTCGGCCGACTACTCGGGCACGACGC 
CGGTGGGGTCGATGACGCTGCCCAAGCGGGTGGAGCACCGCCAGCCGCGGGTCCGGCGTGA 

15 CCTGGCCTCGGCGCTGCGATCGGCAGCCGCGGGTCTGGTTATTCCAGCCGCCCGGCGCGTCA 
GCGAGGCCGGCGGGTTTCACGATCCGGAGCTGGAGTCGTCGCGCGAACAATTGCGCCGTCAT 
CCGGTGCATACCTCGCCCGGGCTCGAGGACCAGATCCGCCAGGCCGAGCGTTACTTACGCATC 
GAACGCGACAACGCGCAATTAGAGAGGAAGGTCGCCGCCGCCACCAACTCGTTGGCCCGCAC 
GTTCGACCGATTCGTCGGGCTGCTCACCGAACGGGAGTTCATCGATGGCCCGGCCACTGATCC 

20 CGTGGTCACCGACGACGGCCGGCTGCTGGCGCGGATTTACAGCGAGAGCGACCTGTTGGTGG 
CCGAGTGCCTACGTACAGGTGCGTGGGAGGGTTTAAAGCCGGCCGAATTGGCGGGGGTGGTG 
TCGGCGGTGGTCTACGAGACGCGCGGTGGTGACGGCCAGGGCGCCCCGTTCGGAGCCGATGT 
GCCCACACCGCGGTTACGGCAGGCTCTGACTCAGACATCAAGGCTGTCCACGACATTGCGCGC 
CGACGAGCAGGCACACCGCATCACCCCGAGTCGCGAACCCGACGATGGCTTTGTCAGAGTCAT 

25 CTACCGCTGGTCGCGAACCGGTGATCTAGCGGCGGCATTGGCCGCTGCCGACGTGAACGGCA 
GCGGATCACCGTTATTGGCAGGGGATTTCGTGCGTTGGTGCCGTCAGGTGCTCGATCTGCTGG 
ACCAAGTTCGTAACGCTGCGCCCAACCCCGAACTGCGGGCTACCGCAAAGCGCGCTATCGGTG 
ACATTCGGCGCGGCGTCGTCGCGGTTGACGCCGGGTAG 

30 >Rv21 01 helZ helicase, Snf2/Rad54 family TB.seq 2360238:2363276 MW:1 1 1 632 
>emb|AL123456|MTBH37RV:2360238-2363279, helZ SEQ ID NO:68 

ATGCTGGTTTTGCACGGCTTCTGGTCCAACTCCGGCGGGATGCGGCTGTGGGCGGAGGACTCC 
GATCTGCTGGTGAAGAGCCCGAGTCAGGCGCTGCGCTCCGCGCGGCCACACCCGTTCGCGGC 
GCCGGCTGACCTGATCGCCGGCATACATCCGGGCAAACCCGCAACCGCCGTTTTGCTGTTGCC 
35 GTCGTTGCGATCGGCGCCGCTGGACTCGCCGGAGCTGATCCGGCTCGCCCCGCGCCCGGCCG 
CGCGAACCGATCCGATGCTGTTGGCGTGGACGGTACCGGTGGTGGACCTGGACCCCACCGCG 
GCGTTGGCCGCCTTCGACCAGCCCGCCCCCGACGTCCGCTACGGCGCGTCCGTCGACTACCT 
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GGCCGAGCTGGCCGTTTTCGCGCGCGAGTTGGTCGAGCGTGGTCGCGTGCTGCCCCAGCTGC 

GCCGCGACACCCACGGCGCGGCCGCCTGCTGGCGTCCGGTGTTGCAGGGACGCGACGTGGTC 

GCGATGACCTCGCTGGTCTCGGCGATGCCGCCGGTCTGCCGCGCCGAAGTTGGTGGGCACGA 

CCCGCACGAACTGGCAACCTCGGCTCTGGACGCGATGGTCGACGCCGCCGTGCGCGCGGCGC 

TGTCACCGATGGACCTGCTGCCCCCGCGACGGGGTCGCTCCAAACGGCATCGGGCCGTGGAG 

GCTTGGCTGACCGCGTTGACCTGCCCGGACGGCCGGTTCGACGCGGAGCCCGACGAACTCGA 

CGCGCTGGCCGAGGCGTTGCGGCCATGGGACGACGTCGGTATCGGCACCGTCGGCCCGGCGC 

GGGCGACGTTTCGGCTGTCCGAAGTCGAGACCGAAAACGAGGAGACGCCCGCGGGCTCGTTG 

TGGAGGCTGGAGTTCTTATTGCAGTCGACGCAGGACCCCAGCCTGCTGGTCCCCGCCGAGCAG 

GCATGGAACGACGACGGCAGCCTGCGCCGCTGGCTGGACCGGCCGCAGGAGCTGCTGCTGAC 

CGAACTGGGCCGGGCCTCTCGGATTTTCCCCGAGCTCGTCCCGGCGCTGCGCACCGCGTGCC 

CGTCCGGGCTTGAGCTCGACGCCGACGGCGCCTACCGATTCCTGTCGGGTACGGCCGCGGTG 

CTCGACGAGGCTGGGTTTGGCGTGCTGCTGCCGTCCTGGTGGGACCGCCGCCGCAAGCTGGG 

CTTGGTCCTGTCCGCATATACCCCGGTCGACGGCGTGGTGGGCAAGGCCAGCAAGTTCGGCCG 

CGAGCAGCTCGTCGAGTTCCGCTGGGAGCTGGCCGTGGGCGACGATCCGCTCAGCGAGGAGG 

AGATCGCGGCGCTGACCGAAACCAAGTCCCCGCTGATCCGGCTGCGTGGCCAGTGGGTCGCG 

CTCGATACCGAACAGATGCGCCGCGGGCTGGAGTTTTTGGAGCGTAAGCCAACCGGCCGCAAG 

ACCACCGCCGAGATCCTCGCGCTGGCCGCCAGCCACCCCGACGACGTGGACACCCCGCTCGA 

GGTCACCGCCGTACGCGCCGACGGCTGGCTCGGGGACCTGCTCGCCGGGGCCGCCGCGGCG 

TCGCTGCAGCCGTTGGACCCGCCCGACGGATTCACCGCGACGCTGCGTCCCTACCAGCAGCGC 

GGTCTGGCGTGGCTGGCGTTTTTGTCCTCGCTCGGTTTGGGCAGCTGCCTGGCCGACGACATG 

GGCCTGGGCAAGACGGTGCAGCTATTGGCCCTGGAAACCTTGGAATCCGTTCAGCGCCACCAG 

GATCGCGGCGTCGGACCCACACTGCTACTGTGCCCGATGTCGTTGGTGGGCAACTGGCCGCAG 

GAAGCGGCCAGGTTTGCACCCAACCTGCGGGTGTACGCCCACCACGGGGGCGCCCGGCTGCA 

CGGCGAGGCGTTGCGCGACCACCTCGAGCGCACCGACCTGGTCGTGAGCACCTATACCACCG 

CCACCCGCGACATCGACGAGCTGGCGGAATACGAATGGAACCGGGTGGTGCTGGACGAGGCC 

CAGGCGGTGAAGAACAGCCTGTCCCGGGCGGCCAAGGCGGTGCGACGGCTACGCGCGGCGC 

ACCGGGTCGCGCTGACCGGGACACCGATGGAGAACCGGCTCGCCGAGCTGTGGTCGATCATG 

GACTTCCTCAACCCGGGCCTGCTCGGATCCTCCGAACGCTTCCGCACCCGCTACGCGATCCCG 

ATCGAGCGGCACGGGCACACCGAACCGGCCGAACGGCTGCGCGCATCGACGCGGCCCTACAT 

CCTGCGCCGGCTCAAGACCGACCCGGCGATCATCGACGATCTGCCGGAGAAGATCGAGATCAA 

GCAGTACTGCCAACTCACCACCGAGCAGGCGTCGCTGTATCAGGCCGTCGTCGCCGACATGAT 

GGAAAAGATCGAAAACACCGAAGGGATCGAGCGGCGCGGCAACGTGCTGGCCGCGATGGCCA 

AGCTCAAACAGGTGTGCAACCACCCCGCCCAGCTGCTGCACGATCGCTCCCCGGTCGGTCGGC 

GGTCCGGGAAGGTGATCCGGCTCGAGGAGATCCTGGAAGAGATCCTGGCCGAGGGCGACCGG 

GTGCTGTGTTTTACCCAGTTCACCGAGTTCGCCGAGCTGCTGGTGCCGCACCTGGCCGCACGC 

TTCGGCCGTGCCGCCCGAGACATTGCCTACCTGCACGGTGGCACCCCGAGGAAGCGGCGTGA 

94 
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CGAGATGGTGGCCCGGTTCCAGTCCGGTGACGGCCCGCCCATTTTTCTGCTGTCGTTGAAGGC 
GGGCGGTACCGGGCTGAACCTCACCGCCGCCAATCATGTTGTGCACCTGGACCGCTGGTGGAA 
CCCGGCGGTCGAGAACCAGGCGACGGACCGGGCGTTTCGGATCGGGCAGCGGCGCACGGTG 
CAGGTCCGCAAGTTCATCTGCACCGGCACCCTCGAGGAGAAGATCGACGAAATGATCGAGGAG 
5 AAAAAGGCGCTGGCCGACTTGGTGGTCACCGACGGCGAAGGCTGGCTGACCGAACTGTCCACC 
CGCGATCTGCGCGAGGTGTTCGCGCTGTCCGAAGGCGCCGTCGGTGAGTAG 

>Rv21 1 0c prcB proteasome [beta]-type subunit 2 TB.seq 2369727:2370599 MW:30274 
>emb|AL123456|MTBH37RV:c2370599-2369724, prcB SEQ ID NO:69 

10 GTGACCTGGCCGTTGCCCGATCGCCTGTCCATTAATTCACTCTCTGGAACACCCGCTGTAGACC 
TATCTTCTTTCACTGACTTCCTGCGCCGCCAGGCGCCGGAGTTGCTGCCGGCAAGCATCAGCG 
GCGGTGCGCCACTCGCAGGCGGCGATGCGCAACTGCCGCACGGCACCACCATTGTCGCGCTG 
AAATACCCCGGCGGTGTTGTCATGGCGGGTGACCGGCGTTCGACGCAGGGCAACATGATTTCT 
GGGCGTGATGTGCGCAAGGTGTATATCACCGATGACTACACCGCTACCGGCATCGCTGGCACG 

15 GCTGCGGTCGCGGTTGAGTTTGCCCGGCTGTATGCCGTGGAACTTGAGCACTACGAGAAGCTC 
GAGGGTGTGCGGCTGACGTTTGCCGGCAAAATCAACCGGCTGGCGATTATGGTGCGTGGCAAT 
CTGGCGGCCGCGATGCAGGGTCTGCTGGCGTtGCCGTTGCTGGCGGGCTACGACATTCATGCG 
TCTGACCCGCAGAGCGCGGGTCGTATCGTTTCGTTCGACGCCGCCGGCGGTTGGAACATCGAG 
GAAGAGGGCTATCAGGCGGTGGGCTCGGGTTCGCTGTTCGCGAAGTCGTCGATGAAGAAGTTG 

20 TATTCGCAGGTTACCGACGGTGATTCGGGGCTGCGGGTGGCGGTCGAGGCGCTCTACGACGCC 
GCCGACGACGACTCCGCCACCGGCGGTCCGGACCTGGTGCGGGGCATCTTTCCGACGGCGGT 
GATCATCGACGCCGACGGGGCGGTTGACGTGCCGGAGAGCCGGATTGCCGAATTGGCCCGCG 
CGATCATCGAAAGCCGTTCGGGTGCGGATACTTTCGGCTCCGATGGCGGTGAGAAGTGA 

25 >Rv21 18c - = B2126_C1_165 (83.6%) TB.seq 2377471:2378310 MW:30091 
>emb|AL123456|MTBH37RV:c2378310-2377468, Rv2118c SEQ ID NO:70 

GTGTCAGCAACCGGCCCATTCAGCATCGGCGAACGTGTTCAGCTCACCGACGCTAAGGGGCGC 
CGCTACACCATGTCGCTGACTCCCGGTGCCGAATTCCACACTCATCGTGGCTCGATCGCCCACG 
ACGCGGTGATCGGGTTGGAGCAAGGCAGCGTGGTCAAATCCAGCAACGGCGCCCTGTTCCTGG 

30 TGCTGCGCCCGCTGCTGGTCGACTACGTCATGTCGATGCCGCGCGGCCCGCAGGTGATCTATC 
CCAAAGATGCGGCCCAGATCGTGCATGAGGGCGACATATTTCCCGGCGCGCGGGTGCTGGAG 
GCAGGAGCCGGATCCGGTGCTCTGACCTTGTCTTTGCTGCGGGCGGTTGGGCCGGCCGGACA 
GGTGATCTCCTACGAACAGCGCGCCGATCATGCCGAACACGCCCGGCGCAATGTGAGCGGCTG 
CTACGGCCAGCCGCCGGACAACTGGCGACTGGTCGTCAGCGACCTCGCCGACTCCGAACTGC 

35 CCGACGGATCCGTTGATCGGGCCGTGCTCGACATGCTGGCGCCGTGGGAGGTGCTCGACGCG 
GTATCGCGGCTGCTGGTCGCCGGCGGAGTGCTGATGGTCTACGTGGCCACCGTCACTCAGCTG 
TCGAGGATCGTGGAGGCACTGCGGGCCAAGCAGTGCTGGACCGAACCGAGAGCCTGGGAGAC 
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GCTGCAGCGGGGCTGGAACGTCGTAGGGTTGGCGGTTCGGCCGCAGCATTCGATGCGCGGGC 
ATACCGCGTTCCTGGTAGCAACGCGCCGGTTGGCGCCGGGGGCTGTGGCTCCGGCGCCGCTA 
GGTCGTAAGCGCGAGGGACGCGACGGGTAG 

5 >Rv2144c - TB.seq 2404166:2404519 MW:12028 

>emb|AL123456|MTBH37RV:c2404519-2404163, Rv2144c SEQ ID NO:71 

ATGCTGATCATTGCGCTGGTCTTGGCCCTGATTGGGCTCCTGGCCTTGGTGTTCGCGGTGGTCA 
CCAGCAACCAGCTAGTGGCCTGGGTATGCATCGGGGCCAGCGTGCTGGGTGTGGCGTTGCTGA 
TCGTCGATGCGTTGCGAGAACGCCAGCAAGGTGGCGCGGACGAAGCTGATGGGGCTGGGGAA 
10 ACGGGTGTCGCGGAGGAAGCCGACGTCGACTACCCGGAGGAAGCCCCCGAGGAGAGCCAAGC 
CGTCGACGCCGGTGTCATCGGCAGTGAGGAGCCATCGGAGGAGGCCAGCGAAGCGACCGAGG 
AGTCGGCGGTATCGGCGGACCGAAGCGACGACAGCGCCAAGTAG 

>Rv2146c - TB.seq 2405667:2405954 MW:10805 
15 >emb|AL123456|MTBH37RV:c2405954-2405664, Rv2146c SEQ ID NO:72 

TTGGTGGTG I I I I I ICAGATCCTTGGGTTCGCGCTGTTCATCTTCTGGCTGCTGCTGATCGCTCG 
GGTCGTCGTTGAGTTCATCCGCTCGTTCAGCCGTGACTGGCGTCCCACCGGTGTCACCGTGGT 
GATCTTGGAGATCATCATGTCGATCACTGATCCGCCGGTGAAGGTGCTGCGCCGGCTGATCCC 
GCAACTCACGATCGGCGCGGTCCGGTTCGACCTGTCGATCATGGTGCTGCTGCTGGTTGCGTT 
20 CATCGGTATGCAACTGGCGTTTGGTGCTGCGGCCTGA 

>Rv2147c - TB.seq 24061 19:2406841 MW:27630 

>emb|AL123456|MTBH37RV:c2406841-2406116, Rv2147c SEQ ID NO:73 

GTGAATAGTCACTGTAGTCACACCTTCATCACAGACAACAGATCTCCCAGGGCTAGAAGGGGTC 
25 ACGCAATGAGCACACTGCACAAGGTCAAGGCCTACTTCGGTATGGCTCCCATGGAGGATTACGA 
CGACGAGTACTACGACGACCGCGCTCCCTCGCGCGGGTATGCGCGGCCCCGATTCGACGACG 
ACTACGGCCGCTACGATGGGCGCGACTACGACGACGCGCGCAGCGATTCACGCGGTGACCTG 
CGCGGTGAGCCGGCCGACTATCCACCACCGGGATATCGCGGCGGGTACGCGGACGAACCACG 
TTTCCGGCCCCGGGAGTTCGACCGCGCGGAGATGACACGGCCGCGCTTCGGATCGTGGCTGC 
30 GCAACTCCACCCGCGGCGCGCTAGCGATGGACCCCCGCCGGATGGCGATGATGTTCGAGGAT 
GGCCATCCGCTCTCGAAGATCACCACGCTGCGGCCCAAGGACTACAGCGAGGCTCGCACCATC 
GGTGAGCGGTTCCGCGACGGCAGCCCGGTCATCATGGATCTGGTGTCGATGGACAACGCCGAT 
GCCAAGCGGCTGGTCGATTTCGCGGCCGGCCTGGCCTTCGCGCTGCGCGGCTCGTTCGACAA 
GGTCGCGACCAAGGTGTTCCTGCTCTCGCCTGCAGACGTCGATGTGTCCCCCGAGGAGCGCCG 
35 CAGGATCGCCGAAACCGGGTTCTACGCCTACCAATAG 

>Rv2148c - TB.seq 2406841:2407614 MW:27694 
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>emb|AL123456|MTBH37RV:c2407614-2406838, Rv2148c SEQ ID NO:74 • 
ATGGCGGCGGATCTTTCGGCGTATCCAGACCGCGAATCGGAATTGACGCATGCGTTGGCGGCA 
ATGCGATCGCGACTTGCGGCGGCCGCGGAGGCGGCGGGTCGCAATGTCGGCGAAATTGAACT 
TCTACCGATTACCAAATTCTTTCCAGCAACCGATGTTGCGATTTTGTTTCGATTGGGTTGTCGG 
5 CGTTGGCGAATCGCGCGAACAGGAAGCTTCAGCCAAGATGGCCGAACTTAATCGGTTGTTGGC 
GGCTGCCGAGTTGGGTCACTCGGGGGGTGTGCACTGGCACATGGTGGGCCGGATTCAACGCA 
ACAAAGCCGGGTCGCTGGCTCGCTGGGCGCACACCGCTCACTCGGTGGACAGCTCGCGGTTG 
GTGACCGCGCTGGATCGGGCGGTTGTTGCGGCGCTGGCCGAACACCGTCGTGGCGAGCGGCT 
GCGGGTTTACGTCCAGGTCAGCCTCGACGGTGACGGATCCCGGGGCGGCGTCGACAGCACGA 
10 CGCCCGGCGCCGTAGACCGGATTTGCGCGCAGGTGCAGGAGTCAGAGGGCCTCGAACTGGTC 
GGGTTGATGGGCATTCCGCCGCTGGATTGGGACCCGGACGAGGCCTTTGACCGGCTGCAATCG 
GAGCACAACCGGGTGCGTGCGATGTTCCCGCACGCGATCGGTCTGTCGGCGGGCATGTCCAAC 
GACCTTGAAGTCGCCGTCAAACATGGTTCGACCTGTGTGCGTGTCGGTACCGCGCTATTGGGTC 
CGCGGCGGTTACGGTCACCGTGA 

15 

>Rv21 50c flsZ TB.seq 2408386:2409522 MW:38757 
>emb|AL123456|MTBH37RV:c2409522-2408383, ftsZ SEQ ID NO:75 

ATGACCCCCCCGCACAACTACCTGGCCGTCATCAAGGTCGTGGGTATCGGTGGTGGCGGTGTC 
AACGCCGTCAACCGAATGATCGAGCAGGGCCTCAAAGGCGTGGAATTCATCGCGATCAACACC 

20 GACGCCCAGGCGTTGTTGATGAGCGATGCCGACGTCAAACTCGACGTCGGCCGCGACTCCACC 
CGCGGGCTGGGCGCCGGCGCCGATCCGGAGGTCGGCCGTAAGGCCGCCGAGGACGCCAAGG 
ACGAGATCGAAGAGCTGCTGCGCGGTGCCGACATGGTGTTTGTCACCGCCGGCGAGGGGGGC 
GGAACCGGCACCGGGGGGGCACCCGTCGTCGCCAGCATCGCCCGCAAGCTGGGCGCGTTGAC 
CGTCGGTGTGGTCACCCGGCCGTTCTCGTTCGAGGGCAAGCGACGCAGCAATCAGGCCGAAAA 

25 TGGCATCGCGGCGCTGCGGGAGAGTTGCGACACCCTCATCGTGATTCCCAACGACCGGTTGCT 
GCAGATGGGAGATGCCGCGGTATCGCTGATGGATGCTTTCCGTAGCGCCGACGAGGTGCTGCT 
CAACGGCGTGCAGGGCATCACCGACCTGATTACCACCCCGGGTCTAATCAACGTCGACTTCGC 
CGACGTCAAGGGCATCATGTCCGGTGCCGGCACCGCACTGATGGGCATCGGCTCGGCCCGGG 
GCGAAGGCCGGTCGCTCAAAGCGGCCGAGATCGCCATCAACTCGCCGTTGCTGGAAGCCTCGA 

30 TGGAGGGCGCGCAAGGCGTGCTGATGTCGATCGCCGGCGGCAGCGACTTGGGCTTGTTCGAG 
ATCAACGAGGCGGCCTCGTTGGTACAAGACGCCGCTCACCCCGATGCCAACATCATCTTCGGC 
ACCGTCATCGACGATTCGCTCGGTGACGAGGTGCGGGTGACCGTGATCGCGGCCGGCTTCGAC 
GTCAGCGGTCCCGGCCGCAAGCCGGTGATGGGCGAGACCGGCGGCGCCCACCGGATCGAGT 
CAGCCAAGGCAGGCAAGCTCACCTCGACCTTGTTCGAGCCGGTCGACGCCGTCAGCGTGCCGT 

35 TGCACACCAACGGCGCAACCCTGAGCATCGGCGGTGATGACGACGATGTCGACGTGCCGCCCT 
TCATGCGCCGCTGA 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51 146 
>emb|AL123456|MTBH37RV:c2412120-2410636, murC SEQ ID NO:76 

GTGAGCACCGAGCAGTTGCCGCCCGATCTGCGGCGGGTGCACATGGTCGGCATCGGCGGAGC 
TGGCATGTCGGGCATCGCCCGAATCCTGCTGGACCGCGGCGGGCTGGTCTCCGGGTCAGACG 
5 CCAAGGAGTCGCGCGGTGTGCATGCGCTGCGGGCGCGGGGCGCGTTGATCCGGATCGGACAC 
GACGCGTCGTCGCTGGACCTGTTGCCCGGTGGCGCCACGGCGGTCGTCACTACCCATGCCGC 
CATCCCCAAAACCAACCCCGAGCTCGTCGAAGCGAGGCGCCGCGGCATTCCCGTGGTGCTGCG 
GCCGGCCGTGCTGGCCAAGTTGATGGCCGGGCGCACCACATTGATGGTCACCGGCACGCACG 
GCAAGACAACGACGACGTCCATGCTGATCGTCGCCCTGCAGCACTGCGGGQTTGACCCGTCCT 

10 TTGCGGTCGGCGGTGAGCTGGGGGAGGCCGGTACCAACGCCCATCACGGCAGTGGCGACTGT 
TTCGTCGCCGAAGCCGACGAAAGCGATGGCTCGCTGTTGCAGTACACACCCCACGTCGCGGTG 
ATCACCAACATCGAGTCCGATCACCTGGACTTCTACGGCAGCGTCGAGGCGTATGTTGCGGTGT 
TCGACTCCTTCGTGGAGCGCATTGTCCCCGGGGGTGCGCTGGTGGTGTGCACTGACGACCCCG 
GAGGGGCCGCGCTGGCTCAGCGCGCGACTGAGCTGGGAATTCGAGTGCTGCGATACGGGTCG 

15 GTGCCGGGTGAGACCATGGCAGCCACGTTGGTCTCGTGGCAGCAACAGGGGGTCGGCGCGGT 
CGCACATATCCGGTTGGCCTCAGAACTAGCCACAGCACAGGGTCCCCGCGTGATGCGGCTGTC 
GGTGCCCGGGCGACACATGGCGCTCAACGCGCTGGGAGCGCTGCTGGCCGCGGTGCAGATCG 
GCGCCCCGGCCGACGAGGTGCTCGACGGGCTGGCCGGCTTCGAAGGAGTGCGGCGACGATTC 
GAACTGGTTGGGACCTGCGGCGTCGGAAAGGCGTCGGTGCGCGTGTTCGATGACTACGCCCAC 

20 CACCCGACGGAGATCAGCGCGACACTGGCGGCGGCGCGCATGGTGCTCGAACAGGGCGACGG 
TGGCCGCTGCATGGTTGTGTTTCAACCCCATTTGTATTCGCGGACAAAGGCATTCGCTGCTGAG 
TTTGGGCGTGCGCTGAATGCCGCTGACGAGGTGTTCGTACTCGACGTCTACGGAGCTCGTGAA 
CAACCGCTGGCCGGTGTCAGCGGAGCCAGCGTCGCTGAGCACGTCACTGTGCCGATGCGCTA 
CGTCCCGGATTTTTCGGCGGTCGCACAGCAAGTGGCCGCCGCCGCTAGTCCGGGCGACGTCAT 

25 CGTCACGATGGGTGCCGGAGACGTGACCTTGCTGGGCCCGGAAATCCTGACCGCCCTTCGGGT 
CCGGGCCAACCGAAGCGCCCCCGGCCGTCCGGGGGTGCTGGGATGA 

>Rv21 53c murG TB.seq 241 21 20:241 3349 MW:41 829 
>emb|AL123456|MTBH37RV:c241 3349-241 21 17, murG SEQ ID NO:77 

30 GTGAAGGACACGGTCAGCCAGCCGGCCGGCGGGCGCGGGGCAACGGCGCCCCGGCCCGCCG 
ATGCCGCCTCGCCGTCTTGTGGTTCCTCGCCGTCTGCTGATTCCGTGTCGGTCGTTCTCGCCGG 
CGGCGGGACCGCCGGGCACGTCGAGCCCGCCATGGCCGTCGCCGACGCCTTGGTCGCGTTGG 
ATCCGCGCGTCCGGATTACCGCGTTGGGCACCCTCCGTGGACTAGAGACCAGGCTGGTGCCCC 
AGCGCGGCTACCACCTGGAGCTGATCACGGCGGTGCCGATGCCGCGCAAGCCCGGCGGCGAC 

35 CTGGCCCGGCTGCCGTCGCGGGTGTGGCGCGCCGTCCGGGAGGCCCGGGACGTGCTCGACG 
ATGTCGACGCCGACGTCGTCGTCGGTTTCGGTGGGTACGTCGCGCTACCGGCTTACCTAGCCG 
CTCGCGGCCTGCCTTTGCCGCCCCGGCGCCGGCGCCGGATCCCGGTGGTGATCCACGAAGCC 
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AACGCCAGGGCGGGACTGGCCAACCGGGTCGGCGCCCATACCGCGGACCGGGTGCTCTCCGC 
GGTGCCGGATTCCGGGCTGCGGCGCGCCGAGGTGGTTGGGGTCCCGGTCCGTGCGTCGATCG 
CCGCGCTGGACCGCGCGGTGCTGCGAGCCGAGGCGCGGGCACACTTCGGCTTCCCCGACGAC 
GCGCGGGTGCTGCTGGTGTTCGGGGGTTCGCAGGGCGCGGTCTCGCTCAACCGGGCGGTGTC 
5 CGGCGCCGCCGCCGACCTGGCCGCCGCCGGTGTTTGCGTGCTGCATGCCCATGGACCCCAGA 
ACGTGCTGGAGTTGCGCCGTCGGGCTCAAGGTGACCCACCGTACGTGGCGGTGCCCTATTTGG 
ACCGGATGGAGCTGGCCTACGCCGCCGCCGATCTGGTGATCTGCCGGGCCGGGGCGATGACG 
GTCGCCGAAGTATCCGCCGTCGGTCTGCCGGCCATCTACGTGCCGCTGCCGATCGGCAACGGT 
GAACAGCGGCTGAATGCGTTGCCGGTAGTCAATGCCGGCGGCGGCATGGTGGTCGCCGACGC 
10 CGCCCTGACCCCCGAGTTGGTGGCCCGCCAGGTTGCCGGGCTGCTCACCGACCCCGCGCGGC 
TGGCCGCGATGACCGCGGCCGCAGCCAGGGTGGGACATCGCGATGCCGCGGGCCAGGTGGC 
CCGGGCCGCGCTGGCCGTCGCCACCGGGGCCGGTGCCAGGACAACGACGTGA 

>Rv2154c ftsW TB.seq 2413349:2414920 MW:56306 

15 >emb|AL123456|MTBH37RV:c2414920-2413346, ftsW SEQ ID NO:78 

GTGCTAACCCGGTTGCTGCGTCGGGGCACCAGCGACACCGACGGCTCCCAGACTCGAGGGGC 
CGAGCCGGTCGAGGGGCAGCGGACGGGCCCGGAAGAAGCCTCTAACCCGGGTTCGGCGAGG 
CCCCGCACCCGTTTCGGTGCCTGGCTGGGCCGTCCGATGACCTCGTTTCACCTCATCATCGCC 
GTTGCCGCATTGCTGACCACCCTTGGACTGATCATGGTGCTGTCGGCATCGGCGGTGCGGTCC 

20 TACGACGACGACGGATCGGCTTGGGTGATCTTCGGCAAGCAGGTCTTGTGGACGCTTGTGGGT 
CTTATCGGCGGCTATGTCTGTCTGCGGATGTCGGTGCGGTTCATGCGGCGCATCGCCTTCTCCG 
GTTTCGCGATCACCATCGTGATGCTGGTGCTGGTGCTGGTGCCGGGGATCGGCAAGGAGGCCA 
ACGGCTCGCGCGGCTGGTTCGTGGTCGCGGGCTTCTCGATGCAGCCCTCTGAGCTGGCTAAGA 
TGGCGTTCGCCATCTGGGGAGCGCATCTGCTGGCCGCCCGGCGCATGGAACGGGCTTCACTG 

25 CGCGAGATGCTGATTCCACTGGTGCCGGCCGCCGTCGTTGCGCTGGCGCTGATCGTGGCCCAG 
CCCGACCTCGGACAGACCGTGTCGATGGGCATCATCTTGTTGGGCCTGCTGTGGTATGCGGGG 
CTGCCGCTGCGCGTCTTCCTCAGCTCACTGGCGGCGGTCGTCGTCTCGGCCGCCATCCTGGCG 
GTGTCCGCGGGCTACCGATCCGACCGGGTGCGGTCGTGGCTCAACCCCGAAAACGATCCGCAA 
GACTCCGGCTACCAGGCCCGACAGGCAAAGTTCGCGCTGGCTCAAGGTGGCATTTTCGGCGAC 

30 GGTCTGGGCCAAGGCGTGGCCAAGTGGAACTACTTGCCCAACGCCCACAACGACTTCATTTTCG 
CCATCATCGGCGAAGAGCTGGGTCTCGTCGGCGCGCTCGGACTGCTGGGGCTATTCGGATTGT 
TCGCCTACACCGGCATGCGCATCGCTAGCCGGTCCGCCGACCCGTTCCTGCGGCTGCTGACCG 
CCACCACGACACTGTGGGTGCTGGGACAGGCGTTCATCAACATCGGCTATGTGATCGGGCTGC 
TGCCCGTCACCGGCCTGCAGCTGCCGCTCATCTCCGCCGGTGGAACCTCCACGGCCGCAACAC 

35 TTTCGCTGATAGGCATCATCGCCAACGCGGCTCGCCACGAACCGGAGGCGGTGGCCGCGCTG 
CGGGCTGGGCGCGACGACAAGGTGAACCGGTTGCTGCGGCTGCCGCTGCCCGAGCCGTATCT 
GCCCCCTCGTCTCGAGGCGTTTCGTGACCGCAAGCGCGCCAACCCGCAACCGGCCCAAACGCA 
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GCCCGCGCGGAAGACCCCCCGCACGGCGCCCGGACAGCCTGCCCGGCAGATGGGCCTGCCC 
CCGCGACCCGGCTCGCCCCGCACGGCCGATCCGCCGGTTCGTCGATCAGTGCATCATGGAGCT 
GGCCAGCGGTACGCGGGCCAGCGTCGCACACGGCGCGTTCGGGCATTGGAAGGTCAGCGTTA 
CGGGTGA 

5 

>Rv2155c murO TB.seq 2414935:2416392 MW:49314 
>emb|AL123456|MTBH37RV:c2416392-2414932, murD SEQ ID NO:79 

GTGCTTGACCCTCTGGGGCCGGGTGCGCCCGTGTTGGTAGCCGGTGGCCGGGTGACCGGTCA 
GGCGGTGGCCGCGGTGCTGACTCGGTTTGGTGCGACGCCGACGGTGTGCGACGACGATCCGG 

10 TCATGCTGCGACCGCACGCCGAACGTGGGCTGCCGACCGTTAGTTCCTCGGACGCGGTGCAGC 
AGATAACCGGGTATGCGCTGGTGGTCGCCAGTCCCGGCTTCTCGCCCGCAACCCCGCTACTGG 
CCGCGGCCGCGGCGGCGGGGGTGCCGATCTGGGGTGACGTGGAGTTAGCCTGGCGGCTAGA 
CGCAGCGGGCTGCTACGGACCGCCGCGCAGCTGGCTGGTGGTGACCGGCACCAACGGCAAGA 
CCACCACGACGTCGATGCTGCACGCCATGCTGATCGCCGGTGGCCGCCGCGCCGTGCTGTGC 

15 GGCAATATCGGCAGTGCGGTGCTGGATGTGCTGGACGAGCCGGCCGAGCTGCTGGCCGTGGA 
GTTGTCCAGTTTCCAGCTGCACTGGGCGCCGTCGCTGCGGCCCGAGGCCGGCGCGGTGCTCA 
ACATTGCCGAAGACCACCTGGACTGGCATGCCACGATGGCCGAATACACCGCGGCCAAGGCCC 
GGGTGCTGACCGGCGGGGTAGCGGTGGCCGGGCTGGATGACAGCCGAGCGGCCGCACTGCT 
GGACGGCTCACCGGCGCAGGTGCGGGTCGGCTTCCGGCTCGGCGAGCCGGCCGCGCGGGAA 

20 CTGGGCGTGCGCGACGCCCACCTGGTCGATCGCGCCTTCTCCGACGACTTGACGCTGCTGCCG 
GTCGCGTCGATACCGGTGCCAGGTCCGGTCGGCGTGCTTGACGCCCTGGCCGCGGCGGCGCT 
GGCCCGCTCGGTCGGGGTGCCCGCCGGTGCGATCGCCGACGCGGTCACGTCGTTTCGAGTGG 
GCCGACACCGCGCCGAGGTGGTGGCCGTTGCCGACGGCATCACCTACGTGGACGACTCCAAG 
GCCACCAACCCGCACGCCGCGCGGGCTTCGGTGCTTGCATACCCGAGGGTGGTATGGATCGC 

25 CGGTGGCCTGCTCAAGGGCGCGTCGCTTCACGCCGAGGTTGCGGCGATGGCGTCGCGGCTGG 
TCGGTGCGGTGCTGATCGGCCGGGATCGCGCAGCGGTTGCCGAGGCGTTATCACGACACGCG 
CCCGATGTCCCAGTCGTTCAGGTTGTGGCAGGCGAGGATACTGGTATGCCTGCGACTGTTGAG 
GTTCCTGTTGCTTGTGTTCTAGATGTGGCAAAAGATGACAAAGCCGGTGAGACCGTTGGCGCTG 
CCGTGATGACCGCTGCGGTGGCCGCGGCCCGGCGGATGGCCCAACCCGGTGACACCGTGCTG 

30 CTGGCACCGGCCGGCGCCTCATTCGACCAGTTCACCGGTTATGCCGACCGGGGCGAGGCATTC 
. GCGACCGCGGTCCGCGCGGTGATCCGGTAG 

>Rv2156c murX TB.seq 2416397:2417473 MW:37714 
>emb|AL123456|MTBH37RV:c2417473-2416394, murX SEQ ID NO:80 
35 ATGAGGCAGATCCTTATCGCCGTTGCCGTAGCGGTGACGGTGTCCATCTTGCTGACCCCGGTG 
CTGATCCGGTTGTTCACTAAGCAGGGCTTCGGCCACCAGATCCGTGAGGATGGCCCGCCCAGC 
CACCACACCAAGCGCGGTACGCCGTCGATGGGCGGGGTGGCGATTCTGGCCGGCATCTGGGC 
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GGGCTACCTGGGCGCCCACCTAGCGGGCCTGGCGTTTGACGGTGAAGGCATCGGCGCATCGG 
GTCTGTTGGTGCTGGGCCTAGCCACCGCTTTGGGCGGCGTCGGGTTCATCGACGATCTGATCA 
AGATCCGCAGGTCGCGCAATCTCGGGTTGAACAAGACGGCCAAGACCGTCGGGCAGATCACCT 
CCGCCGTGCTGTTTGGCGTGCTGGTGCTGCAGTTCCGGAATGCTGCCGGCCTGACACCGGGCA 
5 GCGCGGATCTGTCCTACGTGCGTGAGATCGCCACCGTCACATTGGCGCCGGTGCTGTTCGTGT 
TGTTCTGCGTGGTCATCGTCAGCGCCTGGTCGAACGCGGTCAACTTCACCGATGGCCTGGACG 
GGCTGGCCGCCGGCACCATGGCGATGGTCACCGCCGCCTACGTGCTGATCACCTTCTGGCAGT 
ACCGCAACGCGTGCGTGACGGCGCCGGGCCTGGGCTGCTACAACGTGCGCGACCCGCTGGAC 
CTGGCGCTCATCGCGGCCGCAACCGCTGGCGCCTGCATCGG I I I I I I GTGGTGGAACGCCGCG 

10 CCCGCCAAGATCTTCATGGGTGACACTGGGTCGCTGGCGTTGGGCGGCGTCATCGCGGGGTTG 
TCGGTGACCAGCCGCACCGAGATCCTTGCGGTGGTGCTGGGTGCGCTGTTCGTCGCCGAGATC 
ACCTCGGTGGTGTTGCAAATCCTGACCTTCCGGACCACCGGGCGCCGGATGTTTCGGATGGCG 
CCCTTCCACCACCATTTCGAGTTGGTCGGTTGGGCTGAAACCACGGTCATCATCCGGTTCTGGC 
TGCTCACCGCGATCACCTGCGGTCTGGGCGTGGCCTTGTTCTACGGTGAGTGGCTTGCCGCGG 

15 TCGGTGCCTGA 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 
>emb|AL123456|MTBH37RV:c241 9002-241 7470, murF SEQ ID NO:81 

ATGATCGAGCTGACCGTCGCGCAGATCGCCGAGATCGTCGGGGGCGCAGTGGCCGATATCTCC 

20 CCGCAAGACGCCGCGCACCGCCGCGTCACCGGGACCGTCGAGTTCGACTCGCGCGCCATCGG 
CCCGGGCGGGCTGTTCCTCGCCCTGCCGGGGGCGCGCGCCGACGGCCACGACCATGCCGCG 
TCGGCGGTAGCCGCGGGCGCCGCCGTCGTGCTGGCCGCCCGCCCGGTGGGGGTGCCGGCCA 
TCGTGGTTCCGCCAGTGGCCGCGCCGAACGTATTGGCCGGCGTCCTCGAGCACGACAACGAC 
GGGTCGGGGGCGGCGGTGCTGGCCGCGCTGGCCAAGCTGGCCACCGCGGTGGCCGCGCAGT 

25 TGGTGGCCGGCGGGCTCACCATCATCGGGATCACCGGCTCGTCGGGCAAGACGTCGACCAAG 
GACCTGATGGCCGCCGTGCTGGCCCCGCTGGGGGAGGTGGTGGCCCCGCCCGGATCGTTCAA 
CAACGAGCTGGGTCACCCGTGGACGGTGCTGCGCGCGACGCGGCGCACCGACTACCTGATTTT 
GGAGATGGCGGCACGCCATCACGGCAACATCGCCGCGCTCGCCGAGATCGCGCCCCCGTCGA 
TCGGAGTCGTGCTCAACGTCGGCACCGCACATTTGGGTGAGTTCGGCTCCCGCGAGGTCATCG 

30 CACAGACCAAAGCCGAACTGCCGCAGGCTGTTCCGCATTCCGGAGCGGTCGTCCTCAACGCTG 
ATGACCCCGCGGTGGCGGCGATGGCCAAGCTGACCGCGGCCCGGGTGGTGCGGGTCAGCCG 
GGACAACACCGGTGACGTTTGGGCGGGGCCGGTGTCGCTGGACGAATTGGCCAGGCCGCGCT 
TTACGCTGCATGCCCACGATGCCCAAGCCGAGGTCCGACTCGGGGTCTGCGGCGACCACCAG 
GTCACTAACGCGCTGTGCGCCGCGGCGGTCGCGCTGGAGTGTGGGGCCAGCGTTGAACAGGT 

35 CGCGGCCGCGCTGACCGCGGCGCCGCCGGTGTCGCGGCATCGGATGCAGGTGACCACCCGC 
GGCGACGGGGTGACGGTGATCGACGACGCCTACAACGCCAACCCCGACTCCATGCGGGCCGG 
GCTGCAGGCGCTGGCCTGGATCGCGCACCAACCCGAGGCCACCCGCCGCAGCTGGGCGGTGC 
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TGGGTGAGATGGCCGAGCTGGGTGAGGACGCGATAGCCGAGCACGATCGCATCGGCCGGCTC 
GCGGTGCGCTTAGATGTGTCTCGACTCGTTGTCGTGGGAACCGGGAGGTCGATCAGCGCCATG 
CACCACGGAGCGGTCCTGGAGGGGGCGTGGGGCTCGGGGGAAGCCACTGCTGATCACGGTGC 
GGATCGCACGGCCGTCAATGTGGCCGACGGTGACGCCGCCCTGGCACTACTGCGCGCCGAGC 
5 TGCGACCCGGGGATGTGGTCTTGGTCAAGGCCTCGAACGCGGCCGGGCTGGGTGCGGTGGCC 
GATGCATTGGTCGCAGACGACACATGCGGGAGTGTGCGCCCATGA 

>Rv2158c murE TB.seq 2419002:2420606 MW:55310 
>emb|AL123456|MTBH37RV:c2420606-2418999, murE SEQ ID NO:82 

10 GTGTCATCGCTGGCCCGAGGGATCTCGCGGCGGCGAACGGAGGTGGCGACACAGGTGGAGGC 
TGCGCCCACTGGCTTGCGCCCCAACGCCGTCGTGGGCGTTCGGTTGGCCGCACTGGCCGATCA 
GGTCGGCGCGGCCCTGGCCGAGGGTCCAGCTCAGCGTGCCGTCACCGAGGACCGGACGGTCA 
CCGGGGTCACGCTGCGCGCCCAGGACGTGTCACCCGGTGACCTGTTCGCCGCCCTGACCGGC 
TCGACCACCCACGGGGCCCGCCACGTCGGCGACGCGATCGCACGCGGCGCCGTCGCGGTGCT 

15 CACCGACCCCGCCGGGGTCGCCGAGATCGCCGGACGAGCGGCCGTGCCCGTGTTGGTGCACC 
CCGCACCCCGCGGCGTGCTCGGCGGCTTGGCCGCCACCGTGTACGGGCATCCGTCCGAGCGG 
TTGACGGTTATCGGGATCACCGGAACGTCCGGCAAGACCACCACCACCTATCTGGTCGAGGCC 
GGGTTACGGGCTGCCGGACGCGTCGCCGGGCTGATCGGCACCATCGGCATCCGCGTCGGCGG 
CGCCGACCTTCCCAGCGCGCTGACCACCCCGGAGGCCCCCACGCTGCAGGCGATGCTGGCGG 

20 CGATGGTCGAACGCGGGGTGGACACCGTGGTCATGGAGGTGTCCAGCCACGCGCTGGCGCTG 
GGCCGGGTGGACGGCACCCGGTTCGCCGTCGGCGCCTTCACCAATCTCTCCCGTGACCACCTG 
GATTTCCACCCCAGCATGGCCGACTACTTCGAGGCCAAGGCGTCATTGTTCGATCCGGACTCGG 
CACTGCGCGCCCGCACCGCCGTGGTGTGCATCGACGACGACGCCGGGCGCGCGATGGCGGC 
GCGGGCCGCCGACGCGATCACCGTCAGCGCCGCCGACCGGCCCGCACACTGGCGCGCCACG 

25 GATGTGGCGCCCACGGACGCGGGCGGGCAACAATTCACCGCCATCGACCCCGCCGGCGTAGG 
GCATCACATCGGAATCCGGCTACCGGGCCGCTACAACGTCGCCAATTGCCTGGTCGCCCTGGC 
GATTCTGGACACCGTCGGGGTCTCCCCGGAACAGGCGGTGCCGGGCCTGCGTGAGATCCGGG 
TCCCGGGGCGGCTCGAGCAGATCGACCGCGGCCAGGGCTTTCTCGCGCTGGTCGACTACGCG 
CACAAACCGGAAGCGCTGCGGTCGGTGCTGACCACCTTGGCGCACCCGGACCGCCGGCTGGC 

30 GGTGGTGTTCGGCGCCGGCGGCGATCGTGACCCGGGCAAGCGGGCCCCGATGGGCCGGATA 
GCCGCGCAGCTGGCCGACTTGGTGGTCGTCACCGACGACAACCCGCGTGACGAAGATCCCAC 
GGCGATCCGCCGCGAAATCCTGGCTGGGGCGGCCGAAGTCGGCGGTGATGCCCAGGTCGTCG 
AGATCGCAGACCGGCGGGACGCGATCCGGCACGCGGTTGCCTGGGCGCGCCCCGGCGACGT 
GGTGCTCATCGCCGGCAAAGGCCACGAGACCGGGCAACGCGGCGGCGGGCGGGTCCGCCCG 

35 TTCGACGACCGGGTGGAGCTGGCTGCCGCGCTAGAGGCCCTCGAGCGGCGCGCATGA 

>Rv2159c - TB.seq 2420632:2421663 MW:36377 
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>emb|AL123456|MTBH37RV:c2421663-2420629, Rv2159c SEQ ID NO:83 

ATGAAATTTGTCAACCATATTGAGCCCGTCGCGCCCCGCCGAGCCGGCGGCGCGGTCGCCGAG 
GTCTATGCCGAGGCCCGCCGCGAGTTCGGCCGGCTGCCCGAGCCGCTCGCCATGCTGTCCCC 
GGACGAGGGACTGCTCACCGCCGGCTGGGCGACGTTGCGCGAGACACTGCTGGTGGGCCAGG 
5 TGCCGCGTGGCCGCAAGGAAGCCGTCGCCGCCGCCGTCGCGGCCAGCCTGCGCTGCCCCTGG 
TGCGTCGACGCACACACCACCATGCTGTACGCGGCAGGCCAAACCGACACCGCCGCGGCGAT 
CTTGGCCGGCACAGCACCTGCCGCCGGTGACCCGAACGCGCCGTATGTGGCGTGGGCGGCAG 
GAACCGGGACACCGGCGGGACCGCCGGCACCGTTCGGCCCGGATGTCGCCGCCGAATACCTG 
GGCACCGCGGTGCAATTCCACTTCATCGCACGCCTGGTCCTGGTGCTGCTGGACGAAACCTTC 

10 CTGCCGGGGGGCCCGCGCGCCCAACAGCTCATGCGCCGCGCCGGTGGACTGGTGTTCGCCCG 
CAAGGTGCGCGCGGAGCATCGGCCGGGCCGCTCCACCCGCCGGCTCGAGCCGCGAACGCTG 
CCCGACGATCTGGCATGGGCAACACCGTCCGAGCCCATAGCAACCGCGTTCGCCGCGCTCAGC 
CACCACCTGGACACCGCGCCGCACCTGCCGCCACCGACTCGTCAGGTGGTCAGGCGGGTCGT 
GGGGTCGTGGCACGGCGAGCCAATGCCGATGAGCAGTCGCTGGACGAACGAGCACACCGCCG 

15 AGCTGCCCGCCGACCTGCACGCGCCCACCCGTCTTGCCCTGCTGACCGGCCTGGCCCCGCAT 
CAGGTGACCGACGACGACGTCGCCGCGGCCCGATCCCTGCTCGACACCGATGCGGCGCTGGT 
TGGCGCCCTGGCCTGGGCCGCCTTCACCGCCGCGCGGCGCATCGGCACCTGGATCGGCGCCG 
CCGCCGAGGGCCAGGTGTCGCGGCAAAACCCGACTGGGTGA 

20 >Rv21 63c pbpB TB.seq 2425049:2427085 MW:72506 

>emb|AL123456|MTBH37RV:c2427085-2425046, pbpB SEQ ID NO:84 

GTGAGCCGCGCCGCCCCCAGGCGGGCCAGTCAGTCGCAGTCGACGCGACCGGCGCGCGGTTT 
GCGCCGGCCACCGGGAGCCCAGGAGGTTGGGCAACGCAAACGGCCCGGCAAAACGCAGAAAG 
CCCGGCAAGCCCAGGAAGCCACGAAATCCCGCCCTGCGACACGGTCAGACGTCGCACCCGCG 

25 GGTCGCTCGACTCGTGCGAGGCGCACCCGGCAGGTGGTGGACGTCGGGACGCGCGGTGCGTC 
GTTCGTCTTTCGGCATCGGACCGGAAACGCGGTCATCTTGGTGTTGATGTTGGTCGCGGCAACA 
CAATTGTTCTTTCTGCAGGTATCACATGCCGCGGGCCTGCGTGCGCAGGCGGCCGGCCAACTC 
AAGGTCACCGACGTCCAGCCAGCGGCTCGCGGCAGCATCGTCGACCGCAACAATGACCGGCTC 
GCGTTCACCATCGAGGCGCGTGCCCTGACGTTCCAGCCGAAGCGGATTCGGCGGCAATTGGAA 

30 GAGGCCAGGAAGAAGACGTCGGCTGCACCCGACCCGCAGCAGCGCCTGCGCGATATCGCCCA 
GGAGGTCGCCGGCAAGCTGAACAACAAGCCAGATGCCGCGGCCGTGCTGAAGAAGCTGCAAA 
GCGACGAGACCTTCGTCTACTTGGCGCGTGCGGTCGACCCGGCTGTCGCCAGCGCGATCTGCG 
CGAAGTATCCCGAGGTCGGTGCGGAAAGACAGGATCTGCGTCAGTACCCGGGTGGGTCGCTG 
GCGGCAAACGTCGTCGGTGGCATCGACTGGGATGGTCATGGGCTGCTGGGTCTGGAGGACTCC 

35 CTGGATGCGGTGCTGGCCGGAACCGACGGATCGGTCACCTACGACCGTGGGTCAGACGGCGT 
CGTCATCCCCGGCAGCTACCGGAATCGGCACAAGGCGGTCCACGGTTCCACCGTCGTGCTCAC 
CCTCGACAACGACATCCAGTTCTACGTGCAGCAGCAGGTGCAGCAGGCCAAGAACCTATCGGG 
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GGCTCACAACGTCTCGGCCGTCGTCCTGGACGCCAAGACCGGCGAGGTGCTCGCGATGGCCA 
ACGACAACACCTTCGACCCGTCGCAAGACATCGGGCGCCAGGGCGACAAGCAGTTGGGCAACC 
CGGCGGTGTCGTCGCCCTTCGAGCCGGGCTCGGTGAACAAGATCGTCGCCGCGTCCGCGGTC 
ATCGAGCACGGGTTGAGCAGCCCCGACGAGGTGCTACAGGTGCCTGGCTCGATCCAGATGGG 
5 CGGTGTTACCGTGCATGACGCTTGGGAGCACGGCGTGATGCCCTATACCACCACGGGGGTGTT 
CGGAAAGTCCTCCAACGTCGGCACGCTGATGCTTTCCCAACGTGTCGGACCGGAACGCTATTAC 
GATATGCTCCGCAAGTTCGGGTTGGGACAGCGCACCGGCGTGGGCCTGCCCGGTGAGAGCGC 
CGGACTGGTGCCGCCAATCGACCAGTGGTCGGGCAGTACGTTCGCTAATCTTCCTATTGGCCAA 
GGTCTTTCGATGACTTTGCTGCAGATGACCGGCATGTACCAGGCCATCGCCAACGATGGAGTGC 

10 GGGTACCCCCACGCATTATCAAGGCCACCGTCGCACCCGACGGCAGCCGAACCGAAGAACCGC 
GCCCCGACGACATTCGCGTGGTGTCGGCGCAGACCGCCCAGACCGTGCGCCAGATGCTGCGT 
GCCGTGGTGCAACGCGATCCGATGGGCTACCAGCAGGGTACCGGGCCGACGGCCGGGGTGCC 
CGGCTATCAGATGGCCGGCAAGACCGGTACCGCGCAGCAGATCAACCCTGGCTGCGGCTGCTA 
CTTCGACGACGTGTATTGGATCACCTTCGCCGGAATCGCCACTGCCGACAATCCCCGCTACGTG 

15 ATCGGCATCATGTTGGACAACCCGGCGCGCAACTCCGACGGCGCGCCTGGGCACTCGGCCGC 
CCCGCTGTTCCACAACATCGCGGGCTGGCTGATGCAGCGCGAAAACGTCCCGCTGTCACCCGA 
TCCCGGGCCTCCTTTGGTCTTGCAGGCCACCTAG 

>Rv2165c - TB.seq 2428236:2429423 MW:42498 

20 >emb|AL123456|MTBH37RV:c2429423-2428233 J Rv2165c SEQ ID NO:85 

GTGCAAACCCGTGCACCGTGGTCTCTGCCCGAAGCGACCCTGGCGTACTTCCCCAACGCCAGG 
TTCGTGTCTTCGGACAGGGACCTCGGTGCAGGGGCGGCGCCTGGAATAGCCGCGTCCCGAAGT 
ACGGCTTGCCAGACCTGGGGAGGTATCACGGTGGCTGATCCAGGTTCGGGGCCAACCGGTTTC 
GGTCATGTGCCGGTATTGGCGCAACGTTGCTTCGAACTGCTTACCCCCGCACTAACCCGCTACT 

25 ATCCAGACGGCTCGCAGGCGGTCCTTCTCGACGCGACCATCGGCGCGGGCGGGCATGCGGAG 
CGGTTTTTGGAGGGATTGCCGGGTCTGCGCCTGATCGGGCTCGACCGTGACCCAACCGCTCTG 
GACGTCGCGCGGTCTCGGCTGGTGCGATTCGCTGACCGACTTACCCTGGTGCACACCCGCTAT 
GACTGTCTGGGCGCAGCGCTGGCTGAATCCGGTTATGCCGCAGTGGGATCAGTCGACGGAATC 
CTGTTCGATCTCGGCGTCTCATCCATGCAGCTCGACCGCGCCGAGCGGGGCTTCGCCTACGCC 

30 ACGGACGCGCCATTGGACATGCGGATGGACCCGACGACGCCGTTGACCGCAGCTGACATTGTC 
AACACTTACGACGAGGCGGCACTAGCCGACATCCTGCGTCGCTACGGAGAGGAGCGGTTTGCT 
CGGCGCATCGCTGCCGGTATCGTCCGCCGACGCGCAAAAACCCCGTTCACCTCGACCGCCGAA 
CTGGTTGCCCTGCTGTACCAGGCGATTCCAGCTCCGGCCCGGCGTGTCGGCGGGCATCCAGCC 
AAGCGAACATTCCAGGCGCTGCGCATCGCGGTCAACGATGAGCTGGAATCGCTGCGCACGGCC 

35 GTTCCTGCCGCGCTGGATGCCCTCGCTATCGGTGGGCGCATCGCGGTGCTGGCCTACCAGTCG 
CTAGAGGACAGGATCGTCAAACGGGTGTTCGCCGAGGCAGTCGCGTCGGCCACCCCTGCGGG 
ACTTCCGGTCGAACTTCCCGGCCATGAGCCGCGATTCCGTTCGTTAACGCACGGCGCCGAACG 
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AGCGAGTGTGGCTGAGATCGAACGCAATCCCCGCAGTACTCCAGTGCGGTTGCGGGCCCTGCA 
ACGAGTCGAGCACCGGGCGCAATCGCAGCAATGGGCAACCGAGAAGGGTGATTCATGA 

>Rv2166c - TB.seq 2429428:2429856 MW15912 
5 >emb|AL123456|MTBH37RV:c2429856-2429425, Rv2166c SEQ ID NO:86 

ATGTTTCTCGGCACCTACACGCCCAAACTCGACGACAAGGGGCGGCTGACGCTGCCGGCCAAG 
TTTCGCGACGCGTTGGCAGGGGGGTTGATGGTCACCAAGAGCCAAGATCACAGCCTGGCCGTT 
TACCCGCGGGCGGCGTTCGAGCAGCTGGCGCGCCGGGCCAGCAAGGCGCCACGAAGCAACC 
CCGAGGCGAGAGCGTTCCTACGTAATCTCGCCGCCGGTACCGACGAACAGCATCCCGACAGTC 
10 AAGGCCGGATCACCTTGTCGGCCGACCACCGCCGCTACGCAAGCCTTTCCAAGGACTGTGTGG 
TGATCGGCGCGGTCGACTATCTCGAGATCTGGGATGCGCAAGCCTGGCAGAACTACCAACAAAT 
CCATGAAGAGAACTTCTCCGCGGCCAGCGATGAAGCACTCGGTGACATCTTCTGA 

>Rv2197c - TB.seq 2461505:2462146 MW:22481 
15 >emb|AL123456|MTBH37RV:c2462146-2461502, Rv2197c SEQ ID NO:87 

ATGGTGAGCAGATATTCCGCATACCGGCGTGGGCCGGATGTAATCTCGCCGGACGTCATCGAT 
CGCATCCTGGTTGGGGCATGTGCCGCGGTGTGGCTGGTGTTCACCGGCGTGTCGGTGGCCGC 
CGCTGTCGCCCTGATGGACCTGGGTAGGGGCTTCCACGAGATGGCCGGAAACCCGCACACCAC 
GTGGGTGCTGTACGCCGTAATTGTGGTCTCCGCACTGGTCATCGTGGGCGCGATACCGGTGCT 
20 GTTGCGAGCTCGCCGCATGGCTGAGGCCGAGCCCGCGACGAGGCCGACGGGTGCATCCGTGC 
GGGGCGGGCGATCGATCGGATCCGGGCATCCGGCGAAACGCGCTGTGGCCGAGTCGGCACCC 
GTACAGCACGCGGATGCATTCGAGGTGGCCGCCGAGTGGTCCAGTGAGGCGGTGGACCGGAT 
CTGGTTGCGCGGGACAGTCGTGTTGACCAGTGCGATTGGCATTGCGTTGATTGCCGTGGCGGC 
GGCGACCTACCTCATGGCGGTCGGTCACGACGGGCCATCTTGGATCAGCTACGGGTTGGCCGG 
25 GGTGGTCACCGCGGGCATGCCGGTGATCGAGTGGCTATACGCTCGGCAGCTGCGCCGGGTGG 
TGGCGCCCCAGTCCAGTTAG 

>Rv21 98c - TB.seq 2462149:2463045 MW:30955 

>emb|AL123456|MTBH37RV:c2463045-2462146, mmpS3 SEQ ID NO:88 
30 ATGAGCGGGCCGAATCCCCCGGGACGGGAACCTGACGAACCCGAATCGGAACCCGTCAGCGA 
CACGGGCGACGAACGGGCTTCCGGCAACCACTTGCCGCCCGTCGCCGGGGGCGGCGACAAAC 
TGCCCAGTGACCAGACGGGCGAGACCGACGCATATTCTCGGGCATACTCTGCCCCGGAATCCG 
AGCACGTCACCGGCGGCCCGTATGTGCCAGCCGATCTCAGGCTCTATGACTACGACGACTATG 
AGGAGTCGTCCGACCTGGACGACGAACTGGCCGCTCCGCGCTGGCCGTGGGTGGTCGGTGTC 
35. GCCGCCATAATTGCCGCCGTTGCGCTCGTGGTTTCGGTGTCGTTGCTCGTCACGCGACCACATA 
CCAGCAAACTCGCCACCGGCGACACTACGTCCTCTGCACCGCCCGTGCAGGACGAAATCACGA 
CCACCAAGCCGGCGCCGCCACCGCCGCCACCAGCCCCACCGCCCACCACCGAGATCCCGACA 
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GCGACGGAGACACAGACGGTCACTGTGACGCCGCCACCACCGCCCCCACCGGCGACAACCAC 
GGCGCCGCCGCCGGCGACCACCACAACGGCGGCGGCACCGCCGCCCACGACCACCACGCCG 
ACCGGTCCGCGGCAAGTCACCTATTCGGTGACCGGTACCAAGGCGCCGGGTGACATTATCTCG 
GTGACTTACGTCGATGCCGCCGGGCGCCGACGGACACAGCACAATGTGTACATCCCGTGGTCC 
5 ATGACGGTCACCCCGATCTCGCAATCCGACGTTGGCTCGGTGGAGGCCTCCAGCCTTTTCCGG 
GTCAGCAAACTCAACTGCTCGATCACCACGAGCGACGGAACGGTGCTCTCATCGAACTCCAACG 
ATGGACCGCAAACGAGCTGCTGA 
>Rv2199c - TB.seq 2463234:2463650 MW:14866 

>emb|AL123456|MTBH37RV:c2463650-2463231. Rv2199c SEQ ID NO:89 
10 ATGCATATCGAAGCCCGACTGTTTGAGTTTGTCGCCGCGTTCTTCGTGGTGACGGCGGTGCTGT 
ACGGCGTGTTGACCTCGATGTTCGCCACCGGTGGTGTCGAGTGGGCTGGCACCACTGCGCTGG 
CGCTTACCGGCGGCATGGCGTTGATCGTCGCCACCTTCTTCCGGTTTGTGGCCCGCCGGTTAG 
ATTCCCGGCCCGAGGACTACGAAGGCGCTGAAATCAGCGACGGCGCAGGAGAACTTGGATTCT 
TCAGTCCGCATAGCTGGTGGCCGATCATGGTCGCGTTGTCCGGCTCGGTGGCAGCGGTCGGCA 
15 TCGCGTTGTGGCTCCCGTGGCTGATCGCCGCCGGTGTGGCATTCATCCTCGCCTCGGCGGCCG 
GATTGGTCTTCGAATATTACGTCGGTCCTGAGAAGCACTGA 

>Rv2200c ctaC TB.seq 2463661 :2464749 MW:40449 
>emb|AL123456|MTBH37RV:c2464749-2463658. CtaC SEQ ID N0:90 

20 GTGACACCTCGCGGGCCAGGTCGTTTGCAACGCTTGTCGCAGTGCAGGCCTCAGCGCGGCTCC 
GGAGGGCCTGCCCGTGGTCTTCGACAGCTGGCGCTCGCAGCAATGCTGGGGGCATTGGCCGT 
CACCGTCAGTGGATGCAGCTGGTCGGAAGCCCTGGGCATCGGTTGGCCGGAGGGCATTACCC 
CGGAGGCACACCTCAATCGAGAACTGTGGATCGGGGCGGTGATCGCCTCCCTGGCGGTTGGG 
GTAATCGTGTGGGGTCTCATCTTCTGGTCCGCGGTATTTCACCGGAAGAAGAACACCGACACTG 

25 AGTTGCCCCGCCAGTTCGGCTACAACATGCCGCTAGAGCTGGTTCTCACCGTCATACCGTTCCT 
CATCATCTCGGTGCTGTTTTATTTCACCGTCGTGGTGCAGGAGAAGATGCTGCAGATAGCCAAG 
GATCCCGAGGTCGTGATTGATATCACGTCTTTCCAGTGGAATTGGAAGTTTGGCTATCAAAGGGT 
GAACTTCAAAGACGGCACACTGACCTATGATGGTGCCGATCCGGAGCGCAAGCGCGCCATGGT 
TTCCAAGCCAGAGGGCAAGGACAAGTACGGCGAAGAGCTGGTCGGGCCGGTGCGCGGGCTCA 

30 ACACCGAGGACCGGACCTACCTGAATTTCGACAAGGTCGAGACGTTGGGCACCAGCACCGAAA 
TTCCGGTGCTGGTGCTGCCGTCCGGCAAGCGTATCGAATTCCAAATGGCCTCAGCCGATGTGAT 
ACACGCATTCTGGGTGCCGGAGTTCTTGTTCAAGCGTGACGTGATGCCTAACCCGGTGGCAAAC 
AACTCGGTCAACGTCTTCCAGATCGAAGAAATCACCAAGACCGGAGCATTCGTGGGCCACTGCG 
CCGAGATGTGTGGCACGTATCACTCGATGATGAACTTCGAGGTCCGCGTCGTGACCCCCAACG 

35 ATTTCAAGGCCTACCTGCAGCAACGCATCGACGGGAAGACAAACGCCGAGGCCCTGCGGGCGA 
TCAACCAGCCGCCCCTTGCGGTGACCACCCACCCGTTTGATACTCGCCGCGGTGAATTGGCCC 
CGCAGCCCGTAGGTTAG 
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>Rv2427c proA g-glutamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
>emb|AL123456|MTBH37RV:c2725475-2724228, proA SEQ ID NO:91 

ATGACCGTGCCAGCACCGTCGCAGCTCGACTTGCGTCAAGAGGTGCACGACGCCGCACGCCG 
5 CGCCCGGGTGGCCGCCCGCCGGCTGGCATCGCTGCCGACGACTGTCAAAGACCGCGCGCTGC 
ACGCGGCTGCCGACGAGCTACTGGCTCACCGCGACCAGATCCTGGCGGCCAACGCCGAAGAC 
CTGAACGCGGCGCGCGAGGCGGACACCCCGGCCGCCATGCTGGACCGGTTGTCCTTGAACCC 
GCAACGAGTCGACGGTATCGCCGCCGGGTTGCGGCAAGTCGCGGGACTGCGCGATCCGGTCG 
GTGAAGTGCTGCGTGGCTATACCCTGCCCAACGGGCTGCAGCTGCGCCAGCAGCGCGTCCCCC 

10 TGGGCGTGGTCGGCATGATCTACGAGGGCCGCCCCAATGTCACCGTGGATGCCTTCGGGCTGA 
CACTCAAGTCGGGTAACGCTGCATTGCTGCGCGGCAGCTCGTCGGCCGCAAAGTCCAACGAGG 
CCCTGGTGGCGGTGTTACGCACCGCGCTGGTCGGCCTGGAGCTGCCGGCCGACGCGGTCCAG 
CTGCTGTCGGCTGCCGACCGCGCCACCGTCACTCACCTGATTCAGGCCCGCGGCCTGGTCGAT 
GTGGTGATTCCACGCGGGGGAGCGGGCCTGATCGAGGCGGTCGTACGCGATGCCCAGGTGCC 

15 CACCATCGAGACCGGCGTCGGGAACTGCCATGTCTACGTGCACCAAGCGGCCGACCTGGACGT 
GGCCGAGCGTATCTTGCTGAACTCCAAGACGCGGCGGCCCAGCGTCTGCAACGCCGCCGAGA 
CGCTGCTGGTCGACGCAGCGATCGCCGAAACGGCGTTGCCTCGATTGCTGGCCGCCCTGCAGC 
ACGCCGGTGTCACCGTACATCTCGACCCGGACGAGGCCGACCTGCGCCGCGAATACCTGTCGC 
TGGACATCGCGGTGGCGGTGGTCGACGGTGTCGACGCTGCCATCGCCCATATCAACGAATACG 

20 GCACCGGGCACACAGAAGCGATTGTGACCACCAATCTTGATGCGGCCCAACGCTTTACCGAACA 
GATCGATGCGGCCGCGGTGATGGTGAACGCATCAACGGCGTTCACCGACGGCGAGCAATTCGG 
CTTCGGCGCCGAGATCGGCATCTCCACCCAGAAACTGCATGCCCGCGGACCGATGGGACTACC 
GGAATTGACGTCGACCAAGTGGATCGCATGGGGAGCCGGCCACACCCGTCCGGCCTGA 



25 >Rv2438c - similar to YH N4_YEAST P38795 TB.seq 2734793:2737006 MW:80492 
>emb|AL123456|MTBH37RV:c2737006-2734790, Rv2438c SEQ ID NO:92 

ATGGGACTGCTCGGCGGCCAATCAGGGCCCAGGGTCGGCAGCGGCCCAGTCGGTAGCATCCC 
CACGCCGGTCAATGCCGCCATCTGCCAGCAGCGCGGGGGATTCCACGGTGTCGAGCGTGGAT 
ACTCGGCGGGTGATTCGGGCGTTCTGACGTCGCTGGGCGACAATGAAAGGACGATGAACTTTT 

30 ACTCCGCCTACCAGCACGGGTTCGTGCGCGTTGCCGCCTGCACTCACCACACCACCATCGGTG 
ACCCGGCGGCCAACGCCGCGTCGGTATTGGACATGGCCCGTGCGTGCCACGACGATGGCGCA 
GCGTTGGCGGTCTTTCCTGAGCTGACGCTGTCGGGCTACTCCATCGAGGACGTACTACTGCAG 
GACTCTCTGCTCGATGCCGTCGAGGACGCGCTGCTCGACCTGGTGACCGAATCCGCCGACCTG 
TTACCTGTACTGGTGGTCGGGGCTCCGCTGCGGCATCGACACCGCATCTACAACACCGCGGTC 

35 GTCATTCACCGCGGCGCCGTGCTCGGCGTGGTGCCCAAGTCGTATCTACCCACCTATCGCGAG 
TTCTACGAGCGGCGCCAGATGGCGCCCGGAGACGGGGAGCGGGGCACGATCCGCATCGGTGG 
CGCCGACGTGGCCTTCGGCACGGACCTGTTGTTCGCCGCGTCAGATCTACCCGGCTTTGTGTT 
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GCATGTGGAGATCTGCGAGGACATGTTTGTGCCGATGCCGCCCAGCGCCGAGGCGGCCCTGG 
CGGGCGCGACGGTGCTGGCGAATCTGTCCGGCAGCCCGATCACCATCGGCCGTGCCGAGGAC 
CGCCGGCTGCTTGCGCGCTCGGCGTCGGCGCGGTGTCTGGCTGCCTATGTCTATGCCGCCGC 
GGGGGAGGGGGAGTCAACGACGGACCTGGCCTGGGACGGTCAGACGATGATCTGGGAGAATG 
5 GCGCACTGCTCGCGGAGTCCGAACGTTTCCCCAAAGGAGTGCGCCGCAGTGTCGCCGACGTTG 
ACACCGAGTTGCTTCGGTCGGAGCGGCTGCGGATGGGCACGTTCGACGACAACCGGCGTCAC 
CACCGGGAGTTAACGGAATCGTTCCGGCGCATCGACTTCGCACTCGACCCACCGGCAGGCGAC 
ATCGGACTGCTGCGCGAGGTCGAGCGGTTCCCGTTCGTTCCGGCCGATCCGCAACGATTGCAA 
CAGGATTGCTACGAGGCCTACAACATCCAGGTGTCTGGACTCGAGCAACGGTTGCGGGCGCTG 

10 GACTATCCGAAGGTCGTTATCGGTGTGTCCGGGGGATTGGACTCGACGCACGCGCTGATCGTC 
GCGACCCATGCCATGGACCGCGAGGGCCGGCCGCGCAGCGACATTCTGGCGTTTGCGTTGCC 
CGGATTCGCCACCGGGGAGCACACTAAGAACAACGCGATCAAGCTGGCACGTGCGCTGGGGG 
TTACCTTCTCCGAAATCGATATCGGCGACACCGCTCGGTTGATGCTGCACACAATCGGCCATCC 
GTATTCGGTTGGCGAAAAAGTGTACGACGTCACCTTCGAGAACGTCCAGGCCGGGTTGCGCAC 

15 CGACTATCTTTTCCGTATCGCCAACCAGCGCGGGGGAATCGTACTGGGCACCGGGGACCTGTC 
GGAGCTGGCACTGGGTTGGTCGACATACGGTGTCGGCGACCAGATGTCGCACTACAACGTCAA 
CGCCGGTGTGCCCAAGACGCTGATCCAGCACCTGATCCGGTGGGTCATTTCGGCGGGTGAGTT 
CGGTGAGAAGGTGGGTGAGGTATTGCAGTCGGTGCTCGACACCGAGATCACCCCCGAACTCAT 
TCCGACCGGCGAGGAGGAGCTGCAGAGCAGCGAGGCCAAGGTCGGACCTTTCGCCCTACAGG 

20 ACTTTTCGC I I I I I CAGGTACTGCGCTACGGATTTCGCCCGTCGAAGATTGCG I I I I I GGCCTGG 
CATGCGTGGAACGATGCGGAGCGGGGCAACTGGCCGCCCGGCTTCCCAAAGAGCGAACGCCC 
GTCCTATTCATTGGCCGAAATCCGGCATTGGCTGCAGATTTTCGTCCAGCGGTTT^ 
GCCAGTTCAAGCGTTCGGCATTGCCCAACGGCCCCAAGGTGTCCCACGGGGGCGCGTTGTCGC 
CGCGTGGGGATTGGCGGGCCCCGTCGGATATGTCAGCGCGAATCTGGCTCGATCAGATCGACC 

25 GTGAGGTGCCCAAGGGCTAG 

>Rv2439c proB glutamate 5-kinase TB.seq 27371 1 8:2738245 MW:38789 
>emb|AL123456|MTBH37RV:c2738245-2737115, proB SEQ ID NO:93 

ATGAGAAGTCCGCATCGGGACGCAATCCGGACCGCGCGCGGCCTTGTCGTGAAGGTCGGGAC 
30 CACGGCGCTTACCACACCGTCCGGGATGTTCGATGCCGGCCGGCTGGCCGGACTGGCCGAGG 
CGGTCGAGCGGCGGATGAAGGCGGGTTCCGACGTCGTCATCGTGTCTTCGGGCGCCATCGCC 
GCCGGCATCGAGCCGCTCGGGCTGTCCCGTCGTCCCAAAGATCTGGCGACCAAGCAGGCGGC 
GGCCAGCGTCGGGCAGGTCGCGCTGGTGAACTCGTGGAGCGCGGCGTTCGCCCGCTACGGCC 
GCACGGTGGGCCAGGTGCTGCTGACCGCGCACGACATTTCGATGCGGGTGCAGCACACCAAC 
35 GCCCAACGCACGCTGGATCGGCTGCGCGCGTTGCACGCGGTGGCGATTGTCAACGAGAACGA 
CACCGTGGCCACCAACGAGATCCGGTTCGGTGACAACGATCGGCTGTCTGCACTGGTGGCGCA 
CCTGGTCGGCGCCGACGCTTTGGTGCTGCTGTCGGACATCGACGGCCTCTACGACTGCGACCC 
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GCGCAAAACCGCGGACGCGACGTTCATTCCGGAGGTGTCCGGGCCGGCGGATCTGGACGGTG 
TGGTCGCCGGCCGCAGTAGCCACCTGGGTACTGGCGGCATGGCGTCCAAGGTGGCGGCGGCG 
CTGTTGGCCGCCGACGCCGGGGTGCCGGTACTGCTGGCCCCCGCGGCCGACGCCGCGACCG 
CGCTCGCCGACGCGTCGGTGGGCACGGTGTTTGCGGCCCGGCCCGCGCGTCTGTCGGCCCGG 
5 CGGTTCTGGGTGCGTTATGCCGCCGAAGCAACCGGCGCACTGACTCTCGACGCCGGTGCGGTG 
CGCGCTGTGGTGCGACAACGCCGGTCACTGCTGGCGGCGGGTATCACCGCGGTGTCCGGCCG 
GTTTTGCGGCGGCGATGTGGTCGAACTGCGTGCACCCGACGCGGCCATGGTAGCCCGCGGGG 
TGGTTGCCTACGACGCGTCCGAGCTGGCCACCATGGTGGGCCGGTCCACCTCTGAGCTACCCG 
GCGAGCTGCGCCGCCCGGTGGTGCACGCCGACGATCTGGTCGCGGTGTCGGCGAAGCAAGCT 
10 AAGCAAGTTTAG 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
>emb|AL123456|MTBH37RV:c2739684-2738245, obg SEQ ID NO:94 

GTGCCTCGGTTTGTCGATCGGGTCGTCATCCACACCAGAGCGGGTTCGGGCGGTAACGGCTGC 

15 GCTTCGGTCCATCGCGAGAAATTCAAGCCGCTGGGCGGCCCCGATGGCGGAAATGGCGGCCG 
GGGCGGCAGCATCGTCTTCGTCGTCGATCCGCAAGTGCACACCCTGCTCGACTTCCATTTCCGC 
CCGCATCTCACCGCGGCTTCGGGCAAGCACGGGATGGGCAATAACCGCGACGGGGCCGCCGG 
CGCGGATTTGGAAGTGAAAGTTCCCGAAGGCACCGTGGTATTGGACGAGAACGGCCGGCTACT 
GGCCGACCTGGTCGGCGCGGGCACCCGCTTTGAAGCCGCCGCCGGAGGCCGTGGCGGTTTGG 

20 GCAACGCCGCGCTGGCTTCCCGCGTGCGTAAGGCCCCCGGTTTCGCACTCCTCGGCGAAAAGG 
GACAGTCCCGAGACCTCACCTTGGAACTCAAGACCGTCGCCGACGTCGGCCTGGTCGGGTTTC 
CGTCGGCCGGAAAATCCTCGCTGGTGTCGGCGATTTCGGCGGCCAAGCCGAAGATCGCCGACT 
ACCCGTTCACCACCCTGGTGCCCAACCTCGGTGTGGTCTCGGCTGGCGAGCACGCGTTCACCG 
TCGCCGACGTGCCGGGGTTGATCCCGGGCGCATCCCGGGGCCGTGGTCTGGGGCTGGACTTT 

25 CTGCGGCACATCGAGCGCTGCGCTGTACTGGTGCATGTGGTGGATTGCGCTACCGCCGAGCCG 
GGCCGCGACCCCATCTCGGACATCGACGCGCTGGAAACGGAACTCGCGTGCTACACGCCCAC 
GCTGCAAGGGGACGCGGCTCTGGGCGATCTCGCCGCACGGCCGCGTGCGGTGGTCCTCAACA 
AAATCGATGTGCCGGAGGCCCGCGAGCTCGCGGAGTTCGTCCGTGACGACATCGCCCAGCGC 
GGCTGGCCGGTQTTCTGCGTGTCGACCGCAACCCGGGAAAACCTGCAGCCGTTGATCTTTGGG 

30 CTGTCGCAGATGATCTCGGACTACAACGCTGCGCGGCCGGTGGCGGTGCCACGGCGGCCGGT 
GATTCGTCCGATTCCGGTGGACGACAGCGGTTTTACCGTCGAACCCGACGGGCATGGTGGCTT 
TGTCGTCAGCGGTGCCCGGCCCGAGCGTTGGATTGACCAGACCAACTTCGACAACGACGAGGC 
CGTCGGCTATCTCGCCGACCGGCTGGCGCGCCTGGGTGTCGAGGAGGAATTGCTGAGGCTGG 
GTGCGCGGTCAGGATGCGCGGTGACCATCGGCGAGATGACGTTCGATTGGGAGCCGCAAACG 

35 CCTGCGGGTGAGCCGGTCGCGATGTCCGGCCGGGGCACCGATCCGCGGCTGGACAGCAACAA 
GCGGGTGGGCGCGGCCGAGCGAAAGGCCGCTCGGAGTCGGCGTCGCGAACACGGGGATGGC 
TGA 
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>Rv2441c rpmA 50S ribosomal protein L27 TB.seq 2739773:2740030 MW:8969 
>emb|AL123456|MTBH37RV:c2740030-2739770, rpmA SEQ ID NO:95 

ATGGCACACAAGAAGGGGGCTTCCAGCTCGCGCAACGGTCGCGATTCCGCCGCCCAGCGGCT 
5 GGGGGTTAAGCGGTACGGCGGCCAGGTCGTCAAGGCCGGCGAGATCCTGGTCCGCCAGCGCG 
GTACCAAATTCCATCCCGGCGTCAACGTCGGGCGTGGCGGCGATGACACCTTGTTCGCCAAGA 
CGGCCGGGGCGGTCGAGTTCGGCATCAAACGCGGACGTAAGACGGTGAGCATCGTCGGTTCG 
ACCACTGCCTGA 

10 >Rv2442c rplU SOS ribosomal protein L21 TB.seq 2740048:2740359 MW:1 1 152 
>emb|AL123456|MTBH37RV:c2740359-2740045 l rplU SEQ ID NO:96 

ATGATGGCGACCTACGCAATCGTCAAGACCGGCGGCAAGCAGTACAAAGTCGCTGTCGGAGAT 
GTGGTCAAGGTCGAAAAGCTGGAATCCGAGCAGGGGGAGAAGGTGTCCCTGCCGGTGGCTCT 
GGTTGTCGACGGCGCCACCGTCACCACCGATGCGAAGGCACTGGCCAAGGTCGCGGTGACCG 
15 GTGAGGTGCTCGGGCACACCAAGGGCCCCAAGATCCGTATCCACAAGTTCAAGAACAAGACTG 
GCTACCACAAACGGCAGGGACACCGTCAGCAGCTGACGGTCCTGAAGGTCACCGGCATCGCAT 
AA 

>Rv2448c valS valyl-tRNA synthase TB.seq 2747596:2750223 MW:97822 

20 >emb|AL123456|MTBH37RV:c2750223-2747593, valS SEQ ID NO:97 

ATGCTGCCCAAGTCGTGGGATCCGGCCGCGATGGAGAGCGCCATCTATCAGAAGTGGCTGGAC 
GCTGGCTACTTCACCGCGGACCCGACCAGCACCAAGCCGGCCTATTCGATCGTGCTGCCGCCG 
CCGAACGTGACCGGCAGCCTGCACATGGGCCACGCGCTGGAACACACCATGATGGACGCCTTG 
ACGCGGCGCAAGCGGATGCAGGGCTATGAGGTGCTCTGGCAGCCGGGCACCGACCATGCCGG 

25 GATCGCCACCCAGAGCGTGGTCGAGCAGCAGCTGGCGGTCGACGGCAAGACTAAAGAAGACCT 
CGGCCGCGAGCTGTTCGTGGACAAGGTGTGGGATTGGAAGCGAGAGTCTGGCGGTGCCATCG 
GCGGCCAGATGCGCCGACTCGGTGACGGGGTGGACTGGAGCCGCGACCGGTTCACCATGGAC 
GAAGGTCTGTCGCGGGCGGTGCGCACGATCTTCAAGCGGCTTTATGACGCCGGGCTGATCTAT 
CGGGCCGAGCGGCTGGTCAACTGGTCGCCGGTGCTGCAGACCGCGATCTCCGACCTCGAGGT 

30 CAACTACCGCGACGTCGAAGGCGAGCTGGTGTCGTTTAGGTACGGCTCGCTTGACGACTCGCA 
ACCCCACATCGTGGTCGCCACCACCCGGGTCGAGACGATGCTGGGCGATACCGCGATCGCCGT 
CCATCCCGATGACGAGCGCTACCGTCACCTGGTCGGCACCAGCCTGGCGCACCCATTCGTCGA 
CCGGGAGCTGGCCATTGTCGCCGACGAGCACGTGGACCCTGAATTCGGCACCGGCGCGGTCA 
AAGTCACACCCGCCCACGACCCCAACGACTTCGAAATCGGGGTGCGCCACCAGCTGCCGATGC 

35 CCTCGATCCTGGACACCAAGGGCCGGATCGTCGACACCGGAACGCGATTCGACGGCATGGACC 
GCTTCGAGGCACGGGTCGCGGTGCGCCAAGCGCTCGCGGCCCAGGGCCGCGTGGTCGAAGAA 
AAGCGACCCTACCTGCACAGCGTCGGACACTCCGAACGCAGCGGCGAGCCGATCGAGCCGCG 
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GCTATCCCTGCAGTGGTGGGTCCGGGTGGAATCGCTGGCCAAAGCGGCCGGGGATGCGGTGC 
GCAACGGGGACACCGTGATTCACCCGGCCAGCATGGAACCCCGCTGGTTCTCCTGGGTCGACG 
ACATGCACGACTGGTGCATCTCGCGACAGCTCTGGTGGGGGCATCGGATCCCGATCTGGTACG 
GACCCGACGGCGAACAGGTGTGCGTCGGCCCGGACGAAACACCCCCGCAGGGCTGGGAACAG 
5 GATCCTGACGTGCTGGATACCTGGTTTTCGTCGGCGCTGTGGCCGTTTTCCACGCTGGGTTGGC 
CGGACAAGACGGCGGAGCTGGAAAAGTTCTATCCGACAAGCGTTCTGGTTACCGGCTATGACAT 
CTTGTTCTTTTGGGTGGCCAGAATGATGATGTTCGGCACCTTCGTCGGCGACGACGCCGCCATC 
ACCCTCGACGGCCGCCGGGGCCCGCAGGTGCCGTTCACCGACGTGTTTCTGCATGGGCTGATC 
CGCGACGAGTCTGGCCGCAAGATGAGCAAGTCCAAGGGCAACGTCATCGACCCGCTGGATTGG 

10 GTGGAAATGTTCGGGGCCGATGCGCTGCGGTTCACGCTGGCCCGCGGGGCCAGTCCCGGTGG 
TGACTTGGCGGTGAGCGAGGATGCCGTGCGGGCGTCGCGCAATTTCGGGACCAAGCTGTTCAA 
CGCCACTCGGTACGCACTGCTCAATGGCGCCGCGCCAGCACCCCTGCCATCGCCGAACGAGCT 
GACCGACGCCGACCGCTGGATTCTCGGAAGGTTGGAAGAGGTTCGGGCCGAAGTTGATTCGGC 
CTTCGACGGATACGAGTTCAGCCGCGCTTGTGAGTCCCTGTATCACTTCGCCTGGGACGAATTC 

15 TGCGACTGGTACCTCGAACTGGCCAAAACGCAGCTTGCCCAGGGACTCACACACACCACCGCC 
GTGCTGGCCGCCGGGCTGGACACGCTGCTGCGCCTGCTGCACCCGGTGATTCCCTTCCTCACC 
GAGGCGCTATGGCTGGCGCTGACCGGCAGGGAATCGCTGGTCAGCGCCGACTGGCCGGAGCC 
TTCCGGGATTAGCGTGGACCTTGTTGCCGCGCAACGGATTAACGATATGCAGAAGTTGGTGACC 
GAAGTGCGGCGGTTCCGCAGCGATCAAGGTCTGGCCGACCGGCAGAAGGTTCCGGCCCGAAT 

20 GCACGGTGTGCGGGACTCGGATCTGAGCAACCAGGTGGCCGCCGTGACCTCGCTGGCGTGGC 
TCACCGAGCCGGGCCCGGATTTTGAGCCGTCGGTCTCGTTGGAGGTTCGGCTCGGCCCCGAGA 
TGAACCGCACCGTCGTCGTCGAGCTCGACACCTCGGGCACCATCGACGTGGCCGCCGAGCGT 
CGCCGCCTGGAAAAGGAGTTGGCCGGCGCCCAAAAGGAGCTGGCGTCGACCGCCGCCAAGTT 
GGCCAACGCGGACTTTCTGGCCAAAGCGCCCGACGCCGTCATTGCCAAGATCCGGGACCGCCA 

25 GCGCGTGGCGCAGCAGGAAACCGAGCGCATCACCACCCGGTTGGCTGCGCTGCAATGA 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 >emb|AL123456|MTBH37RV:c2789281- 
2786912, plsB2 SEQ ID NO:98 

GTGACCAAACCGGCGGCCGATGCCAGCGCGGTGCTTACTGCCGAGGACACACTGGTGCTGGC 
30 TTCCACGGCGACGCCGGTCGAGATGGAGCTGATCATGGGCTGGCTGGGCCAGCAGCGTGCAC 
GCCATCCGGACTCGAAGTTCGACATATTGAAGCTGCCACCGCGCAACGCTCCGCCGGCGGCGC 
TGACGGCACTGGTCGAGCAGCTCGAGCCCGGCTTCGCATCCAGCCCGCAATCTGGCGAGGAC 
CGTTCTATCGTGCCGGTTCGGGTGATCTGGCTGCCTCCCGCCGATCGCAGCCGGGCGGGCAAG 
GTGGCCGCACTGCTCCCGGGTCGGGATCCCTACCATCCCAGCCAGCGTCAGCAGCGTCGCATC 
35 CTGCGTACCGATCCCAGGCGCGCGCGGGTGGTGGCCGGCGAGTCGGCCAAGGTGTCCGAACT 
GCGCCAGCAGTGGCGCGATACCACGGTGGCAGAGCACAAGCGCGATTTCGCCCAGTTCGTCAG 
CCGCCGAGCGCTGTTGGCGCTGGCGCGCGCCGAATATCGGATCCTTGGACCGCAATACAAATC 
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TCCCCGGCTGGTGAAGCCGGAGATGTTGGCGTCCGCACGATTTCGTGCCGGCCTGGACCGGAT 
TCCGGGCGCCACGGTCGAAGATGCCGGGAAGATGCTCGACGAACTCTCCACCGGATGGAGCC 
AGGTGTCGGTAGACCTGGTTTCCGTCCTCGGCAGGCTGGCTAGCCGCGGCTTCGATCCGGAAT 
TCGACTACGACGAGTATCAGGTCGCGGCGATGCGCGCCGCACTGGAGGCTCATCCGGCGGTC 
5 CTGCTGTTCTCGCACCGGTCCTACATCGACGGCGTGGTGGTACCGGTGGCCATGCAGGACAAC 
CGGTTACCGCCGGTGCACATGTTCGGCGGCATCAACCTGTCGTTCGGTCTCATGGGACCCCTC 
ATGCGGCGCTCGGGGATGATCTTCATCCGGCGCAATATCGGCAACGACCCACTGTATAAGTACG 
TGCTCAAGGAGTACGTGGGCTACGTGGTCGAGAAGCGGTTCAACCTGAGCTGGTCCATCGAAG 
GCACCCGGTCGCGCACCGGAAAGATGTTGCCGCCCAAGCTCGGTTTGATGAGCTACGTGGCCG 

10 ATGCTTACCTGGACGGCCGCAGTGACGACATCCTGCTGCAGGGGGTTTCGATTTGCTTCGATCA 
GCTGCACGAGATCACCGAATACGCCGCCTACGCGCGTGGCGCGGAGAAGACGCCCGAAGGTT 
TGCGCTGGCTCTACAACTTCATCAAGGCGCAGGGGGAACGCAACTTCGGCAAGATCTACGTTCG 
CTTCCCCGAAGCGGTCTCGATGCGCCAGTACCTCGGCGCACCGCACGGCGAGCTGACCCAGG 
ATCCGGCCGCGAAACGGCTTGCGTTGCAGAAGATGTCGTTCGAGGTGGCCTGGAGGATTTTGC 

15 AGGCGACGCCGGTGACCGCGACGGGTTTGGTGTCCGCACTGCTGCTCACCACCCGCGGCACC 
GCGTTGACGCTCGACCAGCTGCACCACACGTTGCAGGACTCACTGGACTATCTGGAACGCAAA 
CAATCGCCGGTTTCGACAAGCGCATTGCGACTGCGCTCGCGCGAAGGCGTCCGTGCGGCGGC 
GGACGCGTTGTCCAACGGCCACCCGGTCACTCGGGTCGACAGTGGCCGGGAGCCGGTATGGT 
ACATAGCGCCTGACGACGAGCACGCCGCGGCGTTCTACCGGAACTCGGTGATCCATGCG I I I I I 

20 GGAGACCTCGATCGTCGAGCTCGCGCTGGCCCATGCCAAGCACGCCGAAGGTGACCGCGTCG 
CCGCGTTCTGGGCCCAGGCGATGCGGTTGCGGGATCTGCTGAAGTTCGACTTCTATTTCGCGG 
ATTCCACGGCGTTTCGGGCCAACATCGCCCAAGAGATGGCCTGGCACCAAGACTGGGAGGATC 
ATCTTGGCGTCGGGGGCAATGAGATCGACGCGATGCTGTATGCCAAACGGCCGCTGATGTCGG 
ACGCGATGTTGCGGGTCTTCTTCGAAGCCTATGAGATCGTTGCCGACGTGTTGCGCGATGCTCC 

25 GCCTGACATCGGTCCTGAGGAGTTGACGGAGCTGGCGCTCGGCCTCGGCCGTCAGTTTGTGGC 
ACAGGGCCGGGTCCGCAGCAGCGAACCGGTATCGACGCTGCTGTTCGCCACTGCACGCCAGG 
TCGCCGTCGATCAGGAGCTGATAGCGCCGGCGGCCGACCTCGCCGAACGTAGGGTCGCCTTC 
CGGCGGGAGTTACGAAACATTCTGCGGGATTTCGACTATGTCGAGCAGATCGCGCGCAACCAG 
TTCGTCGCCTGCGAGTTCAAAGCGCGTCAAGGACGCGACCGAATCTAA 

30 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 
>emb|AL123456|MTBH37RV:2824676-2825482, Rv2509 SEQ ID NO:99 

ATGCCGATACCCGCGCCCAGCCCCGACGCACGTGCCGTTGTCACCGGGGCTTCGCAGAACATC 
GGCGCGGCGCTGGCCACCGAACTGGCCGCACGCGGGCACCACCTGATCGTCACCGCACGACG 
35 CGAGGACGTGTTGACCGAGTTGGCTGCCCGGCTGGCCGACAAGTACCGCGTCACGGTCGACG 
TGCGACCGGCCGATCTGGCCGATCCGCAAGAACGATCGAAACTGGCCGACGAGCTGGCTGCC 
CGGCCCATCTCGATCCTGTGCGCCAACGCGGGTACCGCGACATTCGGCCCGATCGCATCGCTC 
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GATCTTGCCGGCGAAAAGACGCAGGTGCAGTTGAATGCCGTGGCGGTGCACGACCTTACGTTG 
GCGGTGTTGCCGGGCATGATCGAGCGCAAGGCCGGCGGCATCTTGATTTCTGGTTCGGCGGCC 
GGCAATTCACCGATTCCCTACAACGCCACCTATGCCGCGACCAAGGCCTTCGTGAACACCTTCA 
GCGAATCTCTGCGCGGTGAGCTACGCGGCTCCGGCGTGCACGTCACGGTGCTGGCCCCGGGC 
5 CCGGTTCGCACCGAGCTACCGGATGCCTCCGAAGCGTCACTGGTCGAGAAGCTGGTGCCGGAC 
TTCCTGTGGATCTCGACGGAGCACACCGCCCGGGTATCGCTGAATGCCTTGGAGCGCAACAAG 
ATGCGCGTCGTTCCGGGTCTGACGTCAAAGGCGATGTCGGTGGCCAGCCAATACGCTCCGCGC 
GCCATCGTGGCGCCAATCGTGGGTGCCTTTTACAAGAGGCTTGGGGGCAGCTAG 

1 0 >Rv2524c fas fatty acid synthase TB.seq 28401 24:2849330 MW:326226 
>emb|AL123456|MTBH37RV:c2849330-2840121, fas SEQ ID NO:100 

GTGACGATCCACGAGCACGACCGGGTGTCCGCTGATCGCGGCGGGGACAGCCCGCATACCAC 
CCACGCTCTGGTCGATCGCCTCATGGCTGGTGAGCCCTACGCTGTCGCATTCGGTGGCCAGGG 
CAGCGCCTGGCTGGAAACCCTCGAAGAGCTGGTGTCGGCCACCGGGATAGAAACCGAGTTGGC 

15 GACGTTGGTCGGTGAGGCAGAGCTGTTGCTCGATCCGGTCACCGACGAGCTGATTGTGGTGCG 
CCCGATCGGTTTCGAGCCGCTGCAATGGGTACGCGCACTGGCGGCCGAGGACCCGGTTCCGT 
CCGACAAGCACCTGACGTCGGCCGCCGTGTCGGTGCCCGGCGTGTTGCTTACCCAGATCGCGG 
CGACCCGGGCGCTGGCCCGTCAAGGCATGGACCTCGTGGCCACCCCGCCGGTCGCCATGGCG 
GGGCATTCGCAAGGTGTGCTGGCGGTGGAAGCCCTCAAGGCTGGTGGGGCACGCGACGTCGA 

20 GCTGTTTGCCTTGGCCCAGTTGATCGGTGCCGCCGGAACGCTGGTGGCCCGCCGGCGCGGAA 
TTTCCGTCCTGGGCGATCGCCCGCCGATGGTATCGGTCACCAACGCCGACCCCGAGCGCATCG 
GCCGGTTGCTCGACGAGTTCGCCCAGGACGTGCGCACGGTGCTGCCACCGGTGTTGTCCATCC 
GCAACGGCCGGCGTGCCGTCGTCATCACCGGCACCCCCGAGCAGCTGTCGCGTTTCGAGCTTT 
ATTGCCGCCAGATCTCCGAGAAGGAAGAAGCCGACCGCAAGAACAAGGTCCGCGGCGGCGAC 

25 GTCTTCTCGCCGGTCTTCGAGCCGGTGCAGGTGGAGGTGGGCTTTCACACCCCGCGGCTATCC 
GACGGGATCGACATCGTCGCGGGCTGGGCCGAGAAGGCGGGCCTCGATGTCGCCTTGGCTCG 
GGAGCTGGCCGATGCCATCTTGATCAGAAAGGTCGACTGGGTCGACGAGATCACCCGTGTCCA 
CGCGGCCGGCGCCCGCTGGATCCTCGACCTGGGGCCGGGCGACATCCTGACCCGACTGACCG 
CACCGGTGATCCGCGGCCTGGGCATCGGCATCGTGCCGGCGGCTACCCGCGGTGGCCAGCGC 

30 AACCTGTTCACCGTCGGCGCCACCCCCGAGGTTGCCCGGGCCTGGTCGAGCTACGCACCGACC 
GTGGTTCGCCTCCCCGACGGCAGGGTCAAGCTCTCGACGAAGTTCACCCGGCTGACCGGCCGC 
TCGCCGATCCTGCTCGCGGGCATGACCCCGACCACCGTGGACGCCAAGATCGTCGCCGCGGC 
GGCCAACGCCGGGCACTGGGCCGAGCTGGCCGGCGGCGGGCAGGTCACCGAAGAGATCTTC 
GGTAACCGCATCGAACAAATGGCCGGCCTGCTCGAGCCGGGCCGCACCTATCAGTTCAACGCG 

35 CTGTTCCTCGATCCCTACCTGTGGAAGCTTCAGGTGGGCGGCAAGCGGTTGGTGCAGAAGGCC 
CGCCAGTCCGGCGCCGCGATCGACGGCGTGGTGATCAGCGCCGGCATCCCAGACCTCGACGA 
GGCCGTCGAGCTGATCGACGAACTGGGCGACATCGGCATCAGCCACGTCGTGTTCAAACCCGG 
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GACCATCGAGCAGATCCGCTCGGTGATTCGCATCGCCACCGAGGTGCCCACCAAGCCGGTGAT 
CATGCACGTCGAGGGCGGGCGCGCCGGCGGGCACCATTCCTGGGAGGATCTCGACGACCTGC 
TGCTGGCTACCTACTCGGAGTTGCGCTCACGCGCCAACATCACGGTGTGCGTCGGCGGCGGCA 
TTGGCACCCCGAGAAGGGCTGCGGAATATTTGTCCGGGCGCTGGGCGCAGGCCTACGGCTTCC 
5 CATTGATGCCGATCGACGGCATCCTGGTCGGCACCGCGGCGATGGCCACCAAGGAATCCACCA 
CGTCGCGATCGGTCAAGCGGATGCTCGTCGACACTCAGGGCACCGACCAATGGATCAGCGCCG 
GAAAAGCGCAGGGCGGCATGGCCTCCAGCCGCAGTCAGCTCGGTGCCGATATCCACGAGATC 
GACAACAGCGCATCCCGGTGCGGGCGGCTGCTCGACGAGGTGGCCGGTGACGCGGAGGCGG 
TCGCGGAGCGTCGCGACGAGATCATCGCGGCGATGGCCAAGACCGCCAAGCCCTACTTCGGC 

10 GACGTCGCCGACATGACCTACCTGCAGTGGCTGCGGCGCTACGTCGAACTGGCCATCGGGGAA 
GGCAACTCGACCGCCGACACCGCCTCGGTGGGCAGCCCGTGGCTGGCCGACACCTGGCGGGA 
CCGCTTCGAGCAGATGCTGCAGCGTGCCGAAGCCCGGTTGCACCCACAGGATTTCGGCCCGAT 
CCAGACGCTATTCACCGATGCTGGCCTGCTGGACAATCCGCAGCAGGCGATCGCCGCCCTGCT 
GGCGCGCTACCCCGACGCCGAGACCGTGCAGTTGCATCCCGCGGATGTGCCC l I I I ICGTGAC 

15 GTTGTGCAAGACGCTGGGCAAGCCGGTCAACTTCGTGCCGGTGATCGACCAGGACGTGCGGC 
GCTGGTGGCGCAGCGACTCGCTGTGGCAGGCCCACGACGCCCGCTACGACGCCGATGCGGTG 
TGCATCATTCCGGGCACCGCGTCGGTAGCCGGCATCACCCGGATGGATGAACCCGTCGGTGAG 
TTGCTGGACCGTTTCGAGCAAGCCGCAATCGATGAAGTGCTCGGCGCCGGTGTCGAGCCGAAG 
GATGTCGCGTCGCGCCGGCTGGGCCGCGCCGACGTGGCCGGACCGTTGGCTGTCGTCCTCGA 

20 CGCACCCGATGTGCGCTGGGCCGGTCGCACCGTGACCAACCCGGTGCATCGGATCGCCGACC 
CGGCCGAATGGCAGGTGCACGATGGACCCGAAAACCCGCGCGCCACACACTCATCCACCGGC 
GCCCGGCTGCAGACGCACGGCGACGACGTCGCCTTGAGCGTGCCCGTCTCGGGCACCTGGGT 
CGACATCCGATTCACGTTGCCGGCCAACACCGTCGATGGCGGCACCCCGGTGATCGCCACCGA 
GGACGCCACCAGCGCCATGCGCACGGTGCTGGCGATCGCCGCCGGTGTCGACAGCCCGGAGT 

25 TCTTGCCTGCGGTGGCCAACGGGACGGCCACTTTGACGGTGGACTGGCACCCCGAGCGTGTTG 
CCGACCACACCGGCGTCACCGCCACGTTCGGTGAGCCGCTGGCACCCAGCCTCACCAACGTG 
CCCGACGCGCTCGTCGGCCCTTGTTGGCCAGCGGTTTTCGCGGCCATCGGATCGGCGGTCACC 
GACACCGGTGAGCCGGTGGTGGAAGGCCTGCTGAGCCTGGTGCATCTGGACCACGCCGCCCG 
CGTGGTCGGTCAGCTGCCCACGGTCCCGGCCCAATTGACCGTCACCGCAACGGCTGCCAACGC 

30 AACCGATACGGACATGGGCCGCGTCGTGCCGGTCTCGGTCGTCGTTACCGGCGCCGATGGCG 
CCGTGATCGCCACTCTCGAGGAGCGATTCGCGATCCTGGGTCGCACCGGTTCCGCCGAGCTCG 
CCGACCCGGCGCGAGCCGGTGGCGCGGTGTCGGCGAACGCCACCGACACCCCGCGCCGTCG 
CCGCCGCGACGTCACGATCACCGCGCCGGTCGACATGCGCCCGTTCGCGGTGGTGTCCGGCG 
ACCACAACCCCATTCACACCGACCGGGCCGCCGCGCTGCTTGCCGGCCTGGAGTCGCCGATC 

35 GTGCACGGCATGTGGCTGTCGGCCGCGGCGCAACACGCGGTGACCGCCACCGACGGGCAGG 
CCCGGCCACCGGCCCGGCTGGTCGGCTGGACCGCGCGGTTTTTGGGCATGGTGCGCCCCGGC 
GACGAGGTGGACTTCCGCGTCGAGCGCGTCGGAATCGACCAGGGCGCAGAGATTGTGGACGT 
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GGCCGCGCGCGTCGGGTCGGATCTAGTGATGTCGGCCTCCGCGCGACTGGCCGCACCCAAGA 
CGGTCTACGCATTCCCCGGCCAGGGCATCCAACACAAGGGCATGGGCATGGAGGTGCGCGCC 
CGCTCCAAGGCGGCCCGCAAGGTGTGGGACACCGCGGACAAGTTCACCCGCGACACCCTGGG 
CTTCTCGGTACTGCACGTGGTCCGCGACAACCCGACCAGCATCATCGCCAGCGGTGTGCACTA 
5 CCACCACCCCGACGGGGTGCTCTACCTGACGCAGTTCACCCAGGTCGCGATGGCGACGGTGG 
CGGCCGCGCAGGTCGCCGAGATGCGTGAACAGGGAGCCTTCGTCGAAGGCGCCATCGCGTGC 
GGCCACTCGGTCGGCGAGTACACCGCGCTGGCCTGCGTGACCGGCATCTACCAACTGGAAGC 
CTTGCTGGAGATGGTGTTTCACCGCGGGTCGAAGATGCACGACATCGTTCCGCGCGACGAGCT 
CGGCCGCTCCAACTATCGGCTGGCGGCCATCCGGCCGTCCCAGATCGACCTCGACGACGCCG 

10 ACGTGCCCGCGTTCGTCGCCGGGATCGCGGAGAGCACCGGTGAATTCCTGGAGATCGTGAATT 
TCAACCTGCGTGGCTCGCAATACGCGATCGCGGGCACGGTACGCGGCCTCGAGGCGCTCGAG 
GCCGAGGTGGAGCGGCGCCGCGAGCTCACCGGCGGCCGACGGTCGTTCATTTTGGTGCCCGG 
CATCGATGTTCCGTTCCACTCGCGAGTGCTGCGGGTCGGGGTGGCCGAATTCCGGCGCTCGCT 
GGACCGGGTCATGCCGCGCGACGCGGACCCCGACCTGATCATCGGGCGCTACATTCCCAACCT 

15 GGTGCCGCGGTTGTTCACCCTGGACCGCGACTTCATCCAGGAAATCCGGGATTTGGTGCCCGC 
CGAGCCGCTCGACGAGATCCTCGCCGACTACGACACCTGGCTTCGCGAGCGTCCGCGCGAGAT 
GGCGCGCACGGTGTTCATCGAGCTGCTGGCATGGCAATTCGCCAGCCCGGTGCGCTGGATCGA 
GACGCAGGATCTGCTGTTCATCGAGGAGGCCGCCGGCGGGCTGGGTGTGGAGCGATTCGTCG 
AGATCGGTGTGAAGAGCTCACCGACGGTGGCGGGTCTTGCCACCAACACCCTCAAACTGCCCG 

20 AATACGCCCACAGCACAGTGGAAGTGCTCAACGCCGAGCGTGATGCCGCGGTGCTGTTCGCCA 
CCGACACCGACCCGGAGCCGGAGCCGGAGGAAGACGAGCCGGTCGCGGAATCGCCCGCGCC 
GGACGTCGTCTCGGAAGCCGCCCCCGTCGCGCCGGCCGCTTCGTCGGCGGGCCCGCGTCCCG 
ACGATCTGGTTTTCGACGCCGCCGATGCCACGCTGGCGCTGATCGCGCTCTCGGCCAAGATGC 
GCATCGACCAGATCGAAGAACTCGACTCCATCGAGTCCATCACCGACGGTGCGTCGTCGCGGC 

25 GCAACCAGCTGCTGGTGGACCTGGGCTCCGAGCTGAACCTCGGTGCCATTGACGGCGCCGCC 
GAATCGGACCTGGCCGGTCTGCGCTCACAGGTGACCAAACTGGCGCGCACCTACAAGCCTTAC 
GGCCCAGTGCTTTCCGACGCCATCAACGACCAGCTTCGCACCGTCCTCGGACCGTCGGGCAAG 
CGGCCCGGCGCCATCGCCGAGCGGGTGAAGAAGACCTGGGAGCTCGGTGAGGGCTGGGCCA 
AGCATGTCACCGTCGAGGTCGCGCTGGGCACCCGCGAGGGCAGCAGCGTTCGCGGCGGCGCC 

30 ATGGGCCACCTGCACGAGGGCGCGCTGGCCGATGCCGCCTCCGTCGACAAGGTCATCGACGC 
GGCGGTCGCATCGGTGGCCGCGCGCCAGGGCGTTTCGGTAGCGCTGCCGTCGGCCGGTAGTG 
GTGGCGGCGCCACCATCGACGCGGCCGCGCTCAGCGAGTTCACCGACCAAATCACCGGCCGT 
GAGGGCGTGCTGGCCTCCGCGGCCCGCCTGGTGCTGGGGCAGCTGGGACTGGACGACCCCGT 
CAACGCCTTGCCGGCCGCCCCCGATTCCGAGCTGATCGACTTGGTCACCGCCGAACTGGGAGC 

35 GGACTGGCCGCGGTTGGTGGCACCGGTGTTCGACCCCAAGAAGGCCGTCGTATTCGACGACC 
GCTGGGCCAGCGCCCGCGAGGACCTGGTGAAGCTGTGGCTGACCGACGAGGGCGACATCGAC 
GCCGACTGGCCGCGCCTGGCGGAGCGCTTCGAGGGTGCCGGCCACGTCGTGGCGACCCAGG 
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CTACCTGGTGGCAAGGTAAGTCGCTGGCCGCGGGCCGGCAGATCCATGCATCGCTGTACGGCC 
GCATCGCCGCCGGCGCCGAGAACCCCGAACCCGGCCGCTACGGCGGCGAAGTTGCCGTGGTG 
ACCGGCGCTTCGAAGGGTTCGATCGCCGCGTCGGTGGTGGCTCGGCTGCTCGACGGCGGAGC 
CACCGTCATCGCGACCACCTCCAAGCTCGACGAGGAGCGGCTGGCGTTCTACCGCACGCTGTA 
5 TCGCGACCACGCCCGTTACGGCGCGGCGCTGTGGCTGGTCGCGGCGAACATGGCGTCCTACT 
CCGACGTCGACGCCCTGGTCGAATGGATCGGCACCGAACAGACCGAAAGCCTTGGGCCGCAGT 
CGATTCACATCAAAGACGCGCAGACCCCGACGCTGCTGTTCCCGTTCGCGGCGCCACGCGTGG 
TCGGGGACCTGTCGGAGGCCGGTTCGCGCGCCGAGATGGAGATGAAAGTGCTGCTGTGGGCC 
GTGCAACGGCTGATCGGCGGCCTGTCGACGATCGGCGCCGAACGCGACATCGCGTCGCGGCT 

10 GCACGTGGTGCTGCCCGGCTCGCCCAACCGTGGCATGTTCGGCGGCGACGGCGCCTACGGCG 
AAGCCAAGTCCGCGCTGGATGCCGTGGTGAGCCGCTGGCACGCCGAGTCGTCCTGGGCGGCA 
CGGGTCAGCCTGGCGCACGCGCTCATCGGCTGGACCCGCGGCACCGGGCTGATGGGCCACAA 
CGATGCCATCGTGGCCGCCGTCGAAGAGGCCGGGGTCACCACCTACTCGACCGACGAGATGG 
CGGCGCTGCTGCTCGACCTGTGTGATGCGGAATCCAAGGTGGCTGCGGCGCGTTCGCCGATCA 

15 AGGCCGACCTGACCGGGGGCCTGGCCGAGGCCAACCTCGACATGGCCGAGCTGGCGGCCAAG 
GCGCGCGAGCAGATGTCGGCAGCGGCGGCCGTCGACGAGGACGCCGAGGCCCCTGGCGCCA 
TCGCCGCGCTGCCGTCGCCGCCCCGGGGTTTCACCCCCGCACCGCCGCCGCAATGGGACGAC 
CTCGATGTCGACCCGGCCGACCTGGTGGTGATCGTCGGCGGCGCCGAAATCGGCCCGTACGG 
CTCGTCACGCACCCGGTTCGAGATGGAGGTCGAAAACGAGCTGTCGGCGGCCGGCGTGCTGG 

20 AGCTGGCCTGGACCACTGGGTTGATCCGCTGGGAGGACGACCCGCAACCCGGTTGGTACGACA 
CCGAATCCGGCGAAATGGTCGACGAATCCGAGTTGGTGCAGCGCTACCACGACGCCGTGGTGC 
AGCGCGTCGGCATTCGCGAATTCGTTGATGACGGCGCGATCGACCCCGACCACGCCTCGCCGC 
TGCTGGTGTCGGTGTTCCTGGAGAAGGACTTCGCGTTCGTGGTGTCCTCGGAGGCCGATGCGC 
GCGCCTTCGTCGAGTTCGATCCCGAGCACACGGTCATCCGGCCGGTGCCCGACTCCACCGACT 

25 GGCAGGTCATCCGCAAGGCCGGCACCGAGATCCGGGTGCCGCGAAAGACCAAGCTGTCCCGC 
GTCGTCGGCGGCCAGATCCCGACCGGGTTCGACCCGACGGTGTGGGGCATCAGCGCAGACAT 
GGCCGGTTCCATCGACCGGTTGGCGGTATGGAACATGGTGGCGACCGTCGACGCGTTCCTGTC 
GTCCGGTTTCAGCCCGGCCGAGGTGATGCGTTACGTGCACCCGAGTTTGGTGGCCAACACCCA 
GGGCACCGGCATGGGCGGCGGCACGTCGATGCAGACGATGTACCACGGCAATCTGTTGGGCC 

30 GCAACAAGCCGAACGACATCTTCCAGGAAGTCTTGCCGAATATCATTGCCGCGCACGTGGTTCA 
GTCCTACGTCGGTAGCTACGGTGCGATGATCCACCCGGTAGCCGCGTGCGCCACCGCCGCGGT 
GTCGGTCGAGGAAGGTGTCGACAAGATCCGGTTGGGCAAGGCTCAACTGGTGGTGGCCGGCG 
GCCTGGATGACCTGACGCTGGAGGGCATCATCGGATTCGGTGACATGGCCGCCACCGCCGACA 
CGTCCATGATGTGCGGCCGCGGCATCCACGACTCGAAGTTTTCCCGGCCCAACGACCGCCGCC 

35 GTCTGGGCTTCGTCGAAGCCCAAGGCGGCGGGACGATCCTGTTGGCCCGCGGGGACCTGGCG 
CTGCGGATGGGGCTGCCGGTGCTGGCGGTGGTGGCGTTCGCGCAGTCGTTCGGCGACGGCGT 
GCACACCTCGATCCCGGCCCCGGGCCTGGGCGCGCTGGGGGCGGGCCGCGGCGGCAAGGAT 
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TCACCGCTGGCGCGGGCGCTGGCCAAGCTGGGCGTGGCCGCCGACGACGTGGCGGTCATCTC 
CAAGCACGACACCTCGACGCTGGCCAACGATCCCAACGAGACCGAGTTGCATGAACGGCTCGC 
CGACGCCCTGGGCCGTTCCGAGGGCGCCCCGCTGTTCGTGGTGTCGCAGAAGAGCCTGAGCG 
GCCACGCCAAGGGCGGCGCGGCGGTCTTCCAGATGATGGGGCTCTGCCAGATATTGCGGGAT 
5 GGGGTGATCCCACCCAACCGCAGCCTCGACTGCGTCGACGACGAGCTGGCCGGCTCCGCGCA 
TTTCGTGTGGGTGCGTGACACGTTGCGGCTCGGCGGCAAGTTCCCACTCAAGGCCGGCATGCT 
GACCAGCCTCGGGTTCGGCCATGTGTCGGGCCTGGTCGCGTTGGTGCATCCGCAGGCGTTCAT 
CGCCTCGCTGGATCCCGCACAGCGCGCGGACTACCAGCGGCGTGCCGACGCCCGCCTGCTGG 
CCGGTCAGCGCCGGCTGGCCTCGGCGATTGCCGGTGGTGCGCCGATGTACCAGCGGCCCGGT 
10 GACCGTCGCTTCGACCACCACGCGCCCGAGCGGCCGCAGGAGGCGTCGATGCTGCTGAATCC 
GGCGGCCCGGCTGGGTGACGGCGAGGCGTATATCGGCTGA 

>Rv2555c alaS alanyl-tRNA synthase TB.seq 2873772:2876483 MW:97326 
>emb|AL123456|MTBH37RV:c2876483-2873769, alaS SEQ ID NO:101 

15 GTGCAGACACACGAGATCAGGAAGCGGTTCCTCGATCATTTCGTGAAGGCGGGCCACACCGAG 
GTGCCCAGCGCCTCGGTGATCCTCGACGACCCCAACCTGTTGTTCGTCAACGCCGGGATGGTC 
CAGTTCGTGCCTTTCTTCTTGGGACAGCGCACGCCGCCGTACCCGACGGCCACCAGCATCCAG 
AAGTGCATCCGTACCCCCGATATCGACGAGGTGGGCATAACCACCCGGCACAACACGI I I I I IC 
AGATGGCCGGCAATTTCAGCTTCGGCGACTATTTCAAACGCGGGGCCATTGAACTGGCCTGGG 

20 CACTGCTGACCAACAGCCTCGCCGCCGGCGGCTACGGCCTGGACCCGGAAAGAATCTGGACG 
ACAGTCTATTTCGACGACGACGAAGCTGTCCGGCTATGGCAGGAGGTTGCCGGGCTGCCGGCG 
GAGCGAATCCAGCGCCGCGGCATGGCCGACAACTACTGGTCGATGGGCATTCCCGGACCGTG 
CGGGCCGTCATCGGAGATCTATTACGACCGCGGACCCGAATTCGGTCCCGCAGGCGGTCCCAT 
CGTCAGCGAAGACCGCTACCTCGAGGTCTGGAACCTGGTGTTCATGCAGAACGAGCGCGGAGA 

25 GGGAACCACCAAGGAGGACTACCAGATCCTCGGGCCGCTGCCCCGCAAGAACATCGACACCG 
GCATGGGCGTCGAGCGGATCGCGCTGGTGCTGCAAGACGTGCACAACGTCTACGAGACCGAC 
CTGCTCAGGCCGGTCATCGATACCGTGGCCAGGGTCGCCGCGCGTGCCTACGACGTCGGCAA 
CCACGAAGACGACGTGCGGTACCGCATCATCGCAGACCACAGCCGCACCGCCGCGATCCTGAT 
CGGTGACGGCGTCAGCCCCGGCAACGACGGTCGCGGTTATGTGCTGCGCCGGCTGCTGCGTC 

30 GGGTGATCCGCTCCGCCAAGCTGCTGGGCATCGACGCTGCGATCGTTGGCGACCTGATGGCCA 
CGGTGCGCAACGCGATGGGCCCGTCATATCCCGAACTCGTCGCCGACTTCGAGCGGATCAGCC 
GGATCGCGGTCGCCGAGGAGACGGCGTTCAACCGCACGCTGGCGTCGGGTTCCAGGCTGTTC 
GAGGAGGTGGCTAGCTCCACCAAGAAATCCGGAGCCACCGTGCTGTCCGGATCGGACGCTTTC 
ACGTTGCATGACACCTACGGGTTCCCGATCGAGCTCACGCTGGAGATGGCGGCCGAAACCGGT 

35 CTGCAGGTAGACGAAATCGGGTTCCGTGAGCTGATGGCCGAGCAGCGCCGCCGTGCCAAGGC 
CGACGCCGCCGCGCGCAAACACGCGCATGCTGACCTGAGCGCCTACCGCGAGCTGGTTGACG 
CCGGCGCCACCGAGTTCACCGGATTCGACGAGTTGCGTTCCCAGGCGCGGATTCTGGGCATCT 
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TCGTCGACGGTAAGCGGGTTCCGGTGGTGGCGCACGGTGTAGCCGGCGGAGCCGGGGAAGG 
GCAGCGTGTCGAACTTGTCTTAGATCGCACCCCGCTCTACGCCGAATCGGGTGGGCAGATCGC 
CGATGAGGGCACCATCAGCGGAACCGGTTCCAGCGAAGCTGCCCGGGCCGCGGTTACCGACG 
TGCAGAAGATCGCCAAAACGCTTTGGGTGCACCGAGTCAACGTGGAATCCGGGGAATTCGTCG 
5 AGGGTGACACCGTAATCGCGGCGGTGGATCCCGGGTGGCGCCGGGGTGCCACGCAGGGCCA 
CTCGGGCACCCACATGGTGCATGCCGCGCTGCGACAAGTGCTGGGGCCCAACGCGGTTCAGG 
CGGGATCGCTGAACCGGCCGGGATATTTGCGCTTCGACTTTAACTGGCAGGGTCCGTTGACCG 
ACGACCAGCGCACCCAGGTCGAAGAGGTCACCAACGAGGCCGTGCAAGCGGACTTCGAGGTG 
CGCACGTTCACCGAACAGCTCGACAAGGCCAAGGCGATGGGTGCCATCGCGCTGTTCGGCGAG 

10 AGCTACCCCGACGAAGTGCGGGTGGTGGAGATGGGTGGACCGTTCTCGCTGGAGCTATGTGGC 
GGCACCCATGTGAGCAACACGGCGCAGATCGGTCCCGTGACGATCCTGGGCGAGTCGTCGATC 
GGCTCCGGGGTGCGCCGGGTGGAGGCCTACGTGGGGTTGGATTCGTTTCGTCACCTGGCCAA 
GGAGCGTGCGTTGATGGCCGGGTTGGCCTGGTCACTGAAGGTGCCGTCCGAAGAGGTACCGG 
CCCGGGTGGCCAATCTAGTGGAGCGCCTGCGGGCCGCCGAGAAGGAACTCGAACGTGTCCGG 

15 ATGGCCAGCGCCCGGGCAGCCGCCACCAATGCCGCCGCCGGGGCTCAGCGGATCGGTAACGT 
CCGTTTGGTGGCGCAGCGAATGTCCGGCGGGATGACCGCGGCAGACCTGCGGTCGTTGATCG 
GCGACATCCGCGGCAAGCTGGGTAGCGAGCCGGCGGTGGTGGCGCTGATTGCCGAGGGCGAA 
AGCCAAACTGTGCCGTATGCGGTCGCGGCCAATCCCGCTGCCCAGGACCTCGGAATCCGTGCC 
AACGACCTGGTCAAACAACTTGCGGTGGCGGTCGAAGGCCGCGGTGGCGGTAAGGCGGACCT 

20 GGCGCAGGGCTCGGGAAAGAATCCGACCGGTATCGACGCCGCGCTCGACGCGGTCCGCTCCG 
AGATCGCCGTGATAGCGCGGGTCGGTTGA 

>Rv2580c hisS histidyl-tRNA synthase TB.seq 2904822:2906090 MW:451 1 8 
>emb|AL123456|MTBH37RV:c2906090-2904819, hisS SEQ ID NO:102 

25 GTGACGGAATTCTCGTCATTTTCGGCCCCCAAGGGGGTACCGGACTACGTCCCGCCCGACTCG 
GCGCAGTTCGTCGCGGTGCGCGACGGGCTGCTCGCGGCGGCCCGTCAAGCCGGCTATAGCCA 
CATCGAGCTGCCCATCTTCGAGGACACCGCCCTGTTCGCCCGGGGCGTGGGTGAATCCACCGA 
CGTGGTGTCCAAGGAGATGTATACGTTCGCCGACCGTGGCGACCGCTCGGTGACGCTGCGGCC 
CGAGGGCACCGCCGGGGTGGTGCGTGCGGTGATCGAACACGGGCTGGATCGCGGCGCGCTG 

30 CCGGTGAAGTTGTGTTATGCGGGCCCGTTTTTCCGCTACGAGCGTCCGCAGGCCGGCCGGTAT 
CGCCAGTTACAGCAAGTCGGGGTGGAGGCGATCGGCGTCGACGACCCGGCGTTGGACGCCGA 
GGTGATCGCCATTGCCGACGCCGGGTTCCGCTCGTTGGGTCTCGACGGGTTCCGGCTGGAAAT 
CACCTCCCTGGGAGACGAGAGTTGCCGTCCGCAGTACCGGGAACTGTTGCAGGAGTTCTTGTTT 
GGACTCGATCTCGACGAGGACACCCGCAGGCGCGCAGGGATCAATCCGCTGCGGGTGCTCGA 

35 CGACAAGCGACCCGAATTGCGTGCGATGACGGCGTCGGCGCCGGTGTTGCTGGATCATCTGTC 
TGATGTCGCCAAGCAGCATTTCGACACCGTGCTCGCCCATCTGGACGCGCTTGGAGTGCCCTAT 
GTCATCAACCCGCGCATGGTGCGCGGCCTGGACTACTACACCAAGACCGCCTTCGAGTTCGTC 
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CATGACGGGCTTGGTGCGCAATCGGGGATCGGCGGCGGGGGGCGCTACGACGGCCTGATGCA 
CCAGCTTGGCGGGCAGGACTTGTCGGGCATCGGGTTCGGGCTGGGCGTGGACCGGACCGTGC 
TGGCGCTGCGGGCCGAGGGCAAGACGGCGGGGGACAGCGCCCGGTGCGACGTGTTCGGCGT 
GCCGCTTGGCGAGGCGGCCAAGCTCAGGCTGGCGGTGCTGGCTGGACGACTGCGCGCGGCC 
5 GGGGTGCGGGTTGACCTTGCCTATGGTGATCGCGGGCTCAAAGGCGCGATGCGCGCGGCCGC 
TCGTTCCGGCGCCCGTGTTGCGTTGGTAGCGGGCGACCGCGACATCGAGGCCGGGACGGTCG 
CAGTGAAGGACTTGACGACGGGTGAGCAAGTTTCGGTCTCGATGGATTCGGTTGTGGCCGAAG 
TAATTTCGCGGCTGGCTGGGTAG 

10 >Rv2614c thrS threonyRRNA synthase TB.seq 2941 190:2943265 MW:77123 
>emb|AL123456|MTBH37RV:c2943265-2941 187, thrS SEQ ID NO:103 

ATGAGCGCCCCCGCACAACCCGCCCCGGGAGTCGATGGCGGCGACCCGTCGCAAGCCCGAAT 
TCGGGTTCCTGCCGGGACCACCGCGGCCACCGCCGTCGGCGAAGCGGGTTTACCGCGGCGCG 
GTACGCCCGATGCGATCGTCGTCGTGCGCGACGCCGACGGCAACCTGCGCGACCTGAGCTGG 

15 GTGCCCGACGTCGACACCGATATCACGCCGGTGGCCGCCAACACCGACGACGGTCGCAGCGT 
GATCCGCCATTCGACCGCGCACGTGTTGGCCCAAGCCGTCCAAGAGCTGTTTCCGCAGGCCAA 
GCTCGGCATCGGACCACCCATCACCGACGGCTTCTACTACGACTTCGACGTGCCCGAGCCGTT 
CACGCCCGAGGACTTGGCGGCGCTGGAAAAGCGGATGCGCCAGATCGTCAAGGAAGGCCAGC 
TGTTCGACCGGCGGGTCTACGAATCCACCGAACAGGCCCGCGCCGAGCTGGCCAACGAGCCC 

20 TACAAGCTGGAACTCGTCGACGACAAATCGGGTGACGCCGAGATCATGGAGGTCGGCGGTGAC 
GAGCTCACCGCGTACGACAACCTCAACCCCCGCACCCGCGAGCGCGTCTGGGGCGACCTGTG 
CCGCGGACCGCACATCCCGACCACCAAACACATCCCGGCGTTCAAGCTCACCCGCAGCTCGGC 
CGCCTACTGGCGGGGCGATCAGAAAAACGCCAGCCTGCAACGGATCTACGGCACCGCGTGGG 
AATCCCAGGAGGCGCTCGACAGGCACCTGGAGTTCATCGAAGAGGCGCAGCGCCGCGACCAC 

25 CGCAAGCTGGGTGTCGAGCTGGACCTGTTCAGCTTCCCCGACGAAATCGGTTCCGGCCTAGCG 
GTTTTCCACCCCAAGGGCGGCATCGTGCGTCGCGAACTGGAGGACTACTCGCGGCGCAAGCAC 
ACCGAGGCGGGCTACCAGTTCGTCAACAGCCCGCACATCACCAAGGCCCAGTTGTTCCACACC 
TCGGGACATCTGGACTGGTACGCCGACGGCATGTTCCCCCCGATGCACATCGACGCGGAGTAC 
AACGCCGACGGCTCGCTGCGCAAACCCGGCCAGGACTACTACCTCAAGCCGATGAACTGCCCG 

30 ATGCACTGCCTGATCTTCCGCGCGCGCGGGCGATCCTATCGGGAACTGCCGTTGCGGCTCTTC 
GAGTTCGGCACGGTGTATCGCTACGAGAAGTCCGGTGTGGTGCACGGGTTGACCCGGGTGCGT 
GGGCTGACCATGGACGACGCGCACATCTTCTGCACCCGCGACCAGATGCGCGACGAGCTGCG 
GTCGCTGCTGCGGTTTGTGCTCGACCTGCTCGCCGACTACGGCCTCACCGACTTCTACCTCGAA 
CTGTCCACCAAGGACCCGGAGAAGTTCGTCGGCGCCGAGGAGGTCTGGGAGGAAGCCACCAC 

35 CGTGCTGGCCGAGGTGGGCGCCGAATCCGGGCTGGAGCTGGTGCCCGATCCAGGCGGCGCG 
GCGTTCTACGGGCCCAAGATTTCAGTGCAGGTCAAAGACGCGCTGGGCCGCACCTGGCAGATG 
TCGACCATCCAGCTGGACTTCAACTTTCCGGAACGTTTCGGGCTGGAGTACACCGCCGCCGACG 
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GAACCCGCCACCGCCCGGTGATGATCCACCGCGCGCTATTTGGGTCGATCGAGCGGTTCTTCG 

GCATTCTCACCGAGCACTACGCGGGGGCGTTCCCGGCCTGGTTGGCGCCCGTGCAGGTGGTC 

GGCATCCCGGTCGCCGATGAGCACGTCGCCTATCTGGAAGAGGTTGCCACGCAACTGAAGTCG 

CACGGGGTGCGGGCCGAGGTGGACGCCAGCGACGATCGGATGGCCAAGAAGATCGTGCACCA 

CACCAACCACAAGGTGCCGTTCATGGTGTTGGCGGGTGATCGTGACGTCGCCGCCGGCGCGGT 

GAGTTTCCGGTTCGGTGACCGCACCCAAATCAACGGTGTGGCCCGTGACGATGCGGTGGCGGC 

CATTGTCGCCTGGATCGCTGACCGCGAAAATGCGGTTCCTACAGCGGAACTGGTGAAAGTGGC 

CGGTCGTGAGTGA 

>Rv2697c dut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:15772 
>emb|AL123456|MTBH37RV:c3014144-3013680, dut SEQ ID NO:104 

GTGTCGACCACTCTGGCGATCGTCCGCCTCGACCCCGGGCTCCCGCTGCCCAGCCGCGCTCAC 

GACGGCGACGCCGGCGTTGATCTCTACAGCGCCGAAGACGTCGAGCTGGCACCTGGGCGCCG 

CGCCCTGGTACGGACGGGTGTTGCGGTCGCCGTCCCGTTCGGCATGGTCGGGCTGGTCCATC 

CGCGCTCCGGGTTGGCCACGCGGGTGGGGCTTTCGATCGTCAACAGTCCGGGCACCATCGAC 

GCGGGTTATCGTGGGGAGATCAAGGTGGCCCTGATCAACTTGGACCCAGCCGCGCCCATCGTG 

GTACATCGCGGTGACCGAATCGCCCAGTTGCTAGTGCAACGGGTTGAGTTGGTCGAGCTGGTC 

GAGGTCTCGTCGTTCGACGAGGCCGGGCTGGCCTCGACATCCCGCGGCGACGGTGGCCACGG 

TTCCTCCGGCGGACATGCGAGTTTGTGA 

>Rv2782c pepR protease/peptidase, M16 family (insulinase) TB.seq 3089045:3090358 MW:47074 
>emb|AL123456|MTBH37RV:c3090358-3089042, pepR SEQ ID NO: 105 

ATGCCGCGACGGTCACCAGCTGACCCCGCGGCGGCGCTGGCGCCGCGGCGCACCACCCTGC 

CGGGCGGGCTGCGAGTGGTCACCGAATTCCTGCCCGCGGTGCACTCCGCGTCGGTCGGGGTG 

TGGGTCGGCGTCGGATCGCGCGACGAAGGCGCCACGGTGGCCGGGGCGGCGCACTTCCTTGA 

GCATTTGCTGTTCAAGTCGACGCCCACCCGCTCTGCCGTGGACATTGCGCAGGCGATGGACGC 

GGTGGGCGGGGAACTGAACGCATTCACCGCCAAGGAGCACACCTGCTACTACGCCCACGTGCT 

CGGCAGCGACTTGCCGTTGGCCGTCGACCTGGTCGCCGATGTGGTGCTCAACGGCCGCTGTGC 

CGCCGACGATGTCGAGGTGGAACGTGACGTCGTCCTCGAGGAGATCGCGATGCGCGACGACG 

ACCCCGAGGACGCCTTGGCGGACATGTTCCTGGCGGCGTTGTTCGGCGACCACCCGGTCGGTC 

GCCCGGTGATCGGCAGCGCGCAATCCGTGTCGGTGATGACGCGGGCTCAACTGCAATCGTTTC 

ACCTGCGGCGCTATACCCCGGAGCGGATGGTCGTCGCGGCCGCCGGCAATGTGGATCACGAC 

GGGCTGGTTGCGTTGGTCCGCGAGCACTTCGGGTCCCGGTTGGTCCGGGGGAGACGGCCAGT 

TGCGCCGCGCAAGGGTACCGGCCGGGTCAACGGCAGCCCCCGGTTGACACTGGTTAGCCGCG 

ACGCCGAACAGACGCATGTGTCGCTGGGCATCCGCACACCCGGGCGCGGCTGGGAGCATCGT 

TGGGCACTGTCGGTGCTGCACACCGCGCTGGGCGGTGGCTTGAGTTCCCGGCTGTTCCAGGAG 

GTCCGCGAGACCCGCGGGCTGGCCTACTCGGTCTACTCCGCGCTGGATCTCTTCGCCGACAGC 

120 
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GGCGCGCTTTCGGTGTACGCGGCCTGCCTGCCCGAACGCTTCGCCGACGTGATGCGGGTGAC 
CGCCGATGTGCTGGAAAGCGTGGCACGCGACGGCATCACCGAGGCGGAATGCGGCATCGCCA 
AGGGATCGCTGCGGGGTGGGCTGGTGCTAGGGCTGGAGGATTCCAGCTCCCGGATGAGCCGG 
CTCGGCCGCAGCGAGTTGAACTACGGCAAGCACCGCAGCATCGAACACACCTTGCGGCAAATC 
5 GAGCAGGTCACCGTGGAGGAGGTCAACGCGGTGGCCCGCCACCTGCTGAGCAGGCGCTACGG 
TGCTGCCGTTCTTGGCCCACACGGATCGAAACGATCACTGCCGCAACAACTTCGAGCGATGGTA 
GGGTAG 

>Rv2783c gpsl pppGpp synthase and polyribonucleotide phosphorylase TB.seq, 
10 3090339:3092594 MW:79736 >emb|AL123456|I^TBH37RV:c3092594-3090336, gpsl 
SEQ ID NO:106 

ATGTCTGCCGCTGAAATTGACGAAGGCGTGTTCGAGACGACCGCCACCATCGACAACGGGAGC 
TTTGGCACCCGGACCATCCGCTTCGAGACCGGCCGATTGGCCTTGCAGGCCGCCGGCGCGGT 
GGTCGCCTACCTCGACGACGACAACATGCTGCTGTCGGCGACCACCGCCAGCAAGAACCCCAA 

15 AGAACACTTCGACTTCTTCCCCCTCACGGTCGACGTCGAGGAGCGCATGTATGCGGCCGGCCG 
CATCCCCGGTTCGTTCTTCCGTCGCGAGGGCCGACCCTCCACCGACGCGATCCTGACCTGCCG 
GCTCATCGACCGCCCGCTGCGCCCGTCGTTTGTCGACGGGCTGCGCAACGAGATCCAAATCGT 
GGTGACGATTCTCAGCCTGGATCCGGGCGATCTCTACGACGTATTGGCGATCAACGCGGCGTC 
GGCGTCCACCCAGCTGGGCGGTCTGCCGTTCTCCGGGCCCATCGGCGGTGTGCGGGTGGCGC 

20 TCATCGACGGCACCTGGGTCGGCTTCCCCACCGTCGACCAGATCGAGCGCGCCGTGTTCGACA 
TGGTCGTGGCCGGCCGGATCGTCGAGGGTGATGTTGCCATCATGATGGTCGAAGCCGAGGCCA 
CCGAAAACGTCGTCGAGCTCGTCGAAGGTGGTGCCCAAGCGCCGACGGAAAGCGTGGTGGCC 
GCGGGCCTGGAGGCGGCCAAGCCGTTTATCGCCGCGCTGTGCACCGCGCAGCAGGAGCTTGC 
CGATGCCGCTGGAAAGTCGGGCAAACCGACCGTCGACTTCCCGGTGTTCCCTGACTACGGCGA 

25 AGACGTGTACTACTCGGTGTCCTCGGTGGCCACCGACGAGTTGGCCGCCGCGTTGACCATCGG 
CGGTAAAGCCGAGCGCGACCAGCGCATCGACGAAATCAAGACCCAGGTTGTGCAGCGGCTCGC 
CGACACCTACGAGGGTCGCGAAAAGGAGGTCGGCGCCGCGTTGCGTGCCCTGACCAAAAAGCT 
GGTTCGGCAGCGCATCCTCACCGACCATTTCCGTATCGACGGCCGCGGCATCACCGACATTCG 
CGCATTGTCGGCCGAGGTGGCCGTGGTTCCGCGCGCGCACGGCAGCGCGCTGTTCGAACGCG 

30 GCGAAACCCAGATCCTGGGTGTGACCACACTCGACATGATCAAGATGGCCCAGCAGATCGACT 
CGTTGGGGCCGGAGACATCGAAGCGGTACATGCACCACTACAACTTCCCGCCGTTCTCCACCG 
GCGAGACCGGTCGGGTCGGTTCGCCCAAGCGGCGTGAGATCGGGCACGGCGCACTGGCCGA 
GCGGGCCCTGGTGCCGGTGTTGCCGAGCGTCGAGGAATTCCCGTATGCCATTCGCCAGGTGTC 
GGAGGCTCTGGGCTCCAACGGGTCGACCTCGATGGGGTCGGTGTGCGCGTCGACGCTGGCGC 

35 TGCTCAACGCCGGGGTGCCGCTCAAGGCGCCGGTGGCCGGCATCGCGATGGGCCTGGTCTCC 
GACGACATTCAAGTAGAAGGGGCGGTCGACGGCGTTGTGGAGCGTCGCTTCGTCACCCTCACC 
GACATCCTCGGCGCCGAAGACGCGTTCGGTGACATGGACTTCAAGGTCGCCGGGACCAAGGAC 
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TTCGTCACCGCGCTGCAGCTGGACACCAAGCTCGACGGGATCCCTTCGCAGGTGCTTGCCGGA 
GCACTCGAGCAGGCCAAGGACGCCCGCCTCACGATCTTGGAGGTGATGGCTGAGGCCATCGAT 
AGACCCGACGAAATGAGTCCCTACGCCCCGCGGGTGACCACCATCAAGGTTCCGGTGGACAAG 
ATCGGGGAGGTCATCGGACCCAAGGGCAAGGTCATCAACGCCATCACCGAGGAGACCGGCGC 
5 GCAGATCTCCATCGAAGACGACGGCACCGTGTTCGTCGGCGCCACCGACGGGCCATCGGCACA 
GGCCGCGATCGACAAGATCAACGCCATCGCCAACCCGCAGCTGCCGACGGTGGGCGAACGGT 
TCCTCGGAACCGTGGTCAAGACCACCGATTTCGGTGCCTTTGTATCGTTGCTGCCTGGCCGCGA 
CGGTCTGGTGCACATTTCCAAACTCGGCAAGGGCAAGCGCATCGCGAAGGTCGAGGACGTTGT 
CAATGTCGGTGACAAGCTGCGGGTGGAGATCGCCGACATCGACAAACGGGGCAAGATCTCCCT 
10 GATCCTGGTCGCCGACGAGGACAGCACCGCCGCCGCTACCGATGCCGCGACGGTCACCAGCT 
GA 

>Rv2793c truB tRNA pseudouridine 55 synthase TB.seq 3102364:3103257 MW:31821 
>emb|AL123456|MTBH37RV:c3103257-3102361, truB SEQ ID NO:107 

15 ATGAGCGCAACCGGCCCCGGAATCGTGGTTATCGACAAGCCCGCGGGAATGACCAGCCATGAC 
GTGGTGGGGCGGTGCCGCCGCATCTTCGCCACCCGGCGGGTCGGCCACGCGGGCACCCTGG 
ACCCGATGGCCACCGGGGTGTTGGTGATCGGCATCGAACGCGCCACCAAGATCCTCGGTCTGC 
TGACGGCGGCCCCCAAGTCGTATGCCGCCACCATCCGCTTGGGTCAGACCACTTCCACCGAGG 
ACGCCGAAGGTCAAGTGCTGCAGTCGGTTCCGGCTAAGCACCTGACCATCGAGGCGATCGACG 

20 CCGCGATGGAGCGGCTGCGCGGTGAGATCCGGCAGGTGCCGTCGTCGGTCAGCGCGATCAAG 
GTCGGTGGCCGACGCGCCTATCGGTTGGCCCGCCAGGGGCGCTCCGTGCAATTGGAAGCCCG 
GCCGATCCGCATCGACCGGTTCGAGCTGCTGGCCGCACGCCGGCGCGACCAGCTCATCGATAT 
CGATGTGGAGATCGACTGCTCCTCGGGAACCTACATCCGCGCGTTGGCACGCGACCTCGGCGA 
CGCGCTTGGGGTGGGAGGCCATGTGACGGCGTTGCGGCGCACCCGCGTCGGCCGCTTCGAGC 

25 TGGACCAGGCGAGATCGCTCGACGATCTCGCGGAGCGCCCCGCGCTGAGCCTGAGCCTCGAT 
GAGGCCTGCCTGCTGATGTTTGCGCGCCGCGACCTGACCGCCGCGGAGGCCAGCGCGGCCGC 
CAACGGCCGGTCCCTGCCGGCGGTCGGTATCGACGGCGTGTACGCGGCCTGTGACGCCGACG 
GCCGGGTTATCGCGCTGCTGCGTGACGAGGGTTCGCGGACCAGGTCGGTGGCGGTGCTCCGC 
CCGGCGACGATGCACCCCGGGTAG 

30 

>Rv2797c - TB.seq 3105619:3107304 MW:58761 >emb|AL123456|MTBH37RV:c3107304-3105616, 
Rv2797c SEQIDNO:108 

GTGCCACTGACCGTGGCCGATATCGATCGGTGGAACGCGCAAGCGGTCCGGGAGGTGTTTCAC 
GCGGCCAGTGCCCGAGCGGAGGTGACGTTCGAGGCGTCGCGTCAGTTGGCCGCGCTGTCGAT 
35 TTTTGCGAACTCGGGTGGCAAGACCGCTGAGGCGGCGGCACACCACAACGCGGGCATTCGCC 
GAGACCTCGACGCCCACGGCAACGAGGCGTTGGCGGTTGCCCGGGCGGCCGACAGGGCCGC 
CGACGGGATTGTGAAGGTTCAGTCCGAGCTGGCCGCACTACGCCATGCCGCCGCGGCCGCCG 
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AGCTGACGATCGATGCGCTGATCAACCGGGTGGTGCCGATCCCCGGGCTGCGATCCACCGAG 
GCGCAGTGGGCGCGGACGCTGGCCAAGCAAACGGAGCTGCAGGCGGAGCTGGATGCGATTAT 
GGCCGAGGCCAATGCCGTCGACGAGGAGCTGGCCTCAGCGGTCAATATGGCCGACGGTGACG 
CGCCCATCCCGGCCGATTCCGGCCCGCCGGTCGGTCCCGAGGGGCTGACCCCGACCCAGCTC 
5 GCCAGCGATGCCAACGAGGAGCGGCTGCGCGAGGAGCGCGCCCGCCTGCAGGCCCACCTCG 
AGCGGTTACAGGCGGAGTATGACCAACTGAGTGTGCGGGCCGCCCGTGACTACCACAACGGCA 
TCCTCGACGGTGACGCGGTGGGCCGACTGGCAGCGCTTACCGACGAGCTGAGCGCCGCCAGG 
GGCCGGCTGGGTGAGCTCGATGCCGTCGACGAGGCGTTGAGCCGAGCACCCGAGACCTACCT 
GACCCAGCTGCAGATTCCCGAGGACCCAAATCAGCAGGTGCTGGCGGCCGTGGCCGTCGGTAA 
10 TCCCGACACCGCCGCCAATGTGTCGGTGACGGTTCCCGGCGTCGGGTCCACCACCCGGGGCG 
CCCTGCCCGGCATGGTGACCGAAGCCCGCGACCTGCGGTCGGAGGTAATCCGGCAACTCAATG 
CTGCCGGCAAGCCCGCATCGGTTGCCACCATCGCCTGGATGGGCTACCACCCGCCCCCGAACC 
CACTCGACACCGGCAGTGCGGGCGATCTGTGGCAGACCATGACCGATGGGCAGGCACACGCG 
GGCGCGGCCGATCTGTCGCGGTATTTGCAGCAGGTGCGCGCCAATAACCCCAGTGGCCACCTG 
15 ACCGTGTTGGGGCACTCGTATGGGTCGCTGACGGCGTCGCTGGCGTTGCAGGACCTCGATGCC 
CAGAGCGCCCATCCGGTCAACGACGTCGTGTTTTACGGCTCACCCGGCTTGGAGCTGTACAGC 
CCGGCGCAGCTCGGGCTCGATCACGGGCACGCTTATGTCATGCAGGCCCCCCACGACCTCATC 
ACCAATCTGGTGGCGCCGTTGGCGCCGCTGCACGGATGGGGCCTGGACCCCTATCTGACCCCC 
GGGTTCACGGAGCTGTCGTCACAGGCGGGTTTTGATCCGGGCGGGATCTGGCGTGACGGAGT 
20 GTATGCCCACGGGGACTACCCGCGGTCCTTCCTCGATGCCGCCGGCCAGCCGCAGCTGCGGA 
TGTCCGGCTATAACCTGGCGGCGATCGCCGCCGGGCTGCCCGACAACACGGTGGGCCCGCCG 
CTGCTTCCGCCAATTCTGGGTGGCGGCATGCCGGCAGCGCCCGGCCCAGCACTGAGAGGGGG 
ACGTTGA 

>Rv2864c ponA2 TB.seq 3175454:3177262 MW:63015 >emb|AL123456|MTBH37RV:c3 177262- 
3175451, Rv2864c SEQ ID NO:109 

ATGGTAACTAAAACAACATTAGCCTCAGCCACCTCAGGTTTGCTGCTGCTTGCGGTCGTCGCCAT 
GTCGGGCTGCACCCCGCGTCCCCAAGGGCCCGGTCCGGCGGCCGAAAAGTTCTTCGCCGCGC 
TGGCCATCGGTGACACCGCCTCCGCCGCCCAGCTCAGCGACAACCCCAACGAGGCGCGCGAA 
GCGCTGAACGCGGCCTGGGCGGGGCTGCAGGCCGCCCACCTGGATGCGCAGGTTCTCAGCGC 
CAAGTACGCCGAGGACACCGGTACGGTCGCTTATCGCTTCAGCTGGCATCTGCCCAAGGACCG 
AATCTGGACCTATGACGGCCAGCTGAAGATGGCCCGCGACGAAGGGCGTTGGCACGTTCGCTG 
GACCACCAGCGGGTTGCATCCCAAGCTAGGCGAACATCAAACGTTCGCGCTACGAGCCGACCC 
GCCGCGGCGCGCCTCGGTGAACGAAGTCGGCGGCACCGATGTGCTGGTGCCGGGCTATCTGT 
ATCACTACTCGCTGGACGCCGGCCAGGCCGGCCGCGAGCTCTTCGGCACGGCACACGCGGTG 
GTGGGCGCGCTGCACCCCTTCGACGACACGCTCAATGATCCGCAGCTGCTGGCCGAACAGGCC 
AGCTCGTCGACCCAGCCGTTGGACCTGGTCACGTTGCACGCCGACGACAGCAACCGGGTGGC 

123 
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CGCGGCGATCGGGCAGCTGCCTGGCGTGGTGATCACACCGCAGGCCGAGCTGCTCCCGACCG 
ACAAGCACTTCGCGCCGGCGGTCCTCAACGATGTCAAGAAGGCCGTCGTCGATGAACTCGACG 
GCAAGGCGGGTTGGCGGGTGGTGAGCGTCAACCAAAATGGCGTCGACGTCTCGGTGCTGCAC 
GAGGTCGCCCCATCACCTGCGTCGTCGGTTTCGATCACGTTGGATCGGGTCGTGCAAAACGCC 
5 GCGCAACACGCGGTGAACACCCGGGGCGGCAAGGCGATGATCGTCGTGATCAAGCCGTCGAC 
CGGCGAGATCCTGGCGATCGCGCAGAACGCCGGGGCCGATGCGGACGGTCCGGTCGCGACCA 
CCGGTCTATATCCACCCGGGTCGACATTCAAGATGATCACCGCCGGTGCGGCCGTCGAGCGTG 
ACCTGGCTACCCCTGAGACGCTGCTGGGTTGCCCCGGGGAGATCGACATCGGGCATCGCACCA 
TTCCCAACTACGGTGGCTTTGATCTGGGCGTGGTGCCGATGTCACGCGCGTTTGCCAGTTCCTG 

10 CAACACCACCTTCGCCGAGCTGAGCAGCAGGCTGCCTCCCCGCGGTCTGACTCAGGCGGCCC 
GGCGGTACGGGATCGGGCTTGACTACCAGGTGGACGGCATCACCACGGTGACCGGTTCGGTG 
CCGCCGACGGTGGACCTGGCCGAACGCACCGAGGACGGTTTCGGCCAGGGCAAGGTGCTGGC 
CAGCCCGTTCGGCATGGCCTTGGTGGCGGCGACGGTAGCCGCCGGGAAGACCCCGGTTCCAC 
AGCTGATCGCCGGCCGGCCGACGGCCGTCGAAGGCGATGCCACACCGATCAGCCAGAAGATG 

15 ATCGACGCGCTGCGGCCCATGATGCGGTTGGTGGTGACCAATGGCACCGCCAAGGAGATCGCT 
GGCTGTGGCGAGGTGTTCGGTAAGACCGGCGAAGCCGAATTCCCGGGCGGATCGCATTCCTG 
GTTCGCCGGGTACCGTGGCGATCTGGCATTTGCGTCGCTGATCGTCGGGGGCGGTAGCTCGGA 
ATACGCGGTGCGGATGACCAAGGTGATGTTCGAATCGCTGCCGCCGGGGTACCTGGCGTAG 

20 >Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 >emb|AL123456|MTBH37RV:c3 180528- 
3179365, gcpE SEQ ID NO:110 

GTGACTGTAGGCTTGGGCATGCCGCAGCCCCCGGCACCCACGCTCGCTCCCCGGCGCGCCAC 
CCGTCAGCTGATGGTCGGCAACGTCGGCGTGGGCAGTGACCATCCGGTCTCGGTGCAATCGAT 
GTGCACCACCAAAACCCACGACGTCAACTCGACATTGCAACAAATCGCCGAGCTGACCGCGGC 

25 CGGATGCGACATCGTGCGGGTGGCCTGCCCGCGCCAGGAGGACGCCGACGCGCTGGCCGAG 
ATCGCCCGGCACAGCCAGATCCCGGTAGTCGCGGACATACATTTCCAGCCGCGCTACATATTCG 
CCGCCATCGACGCTGGATGTGCCGCGGTGCGGGTCAACCCGGGCAACATCAAGGAGTTTGACG 
GCCGGGTGGGTGAGGTCGCCAAGGCGGCGGGTGCGGCCGGGATCCCGATCCGAATCGGTGT 
CAACGCCGGTTCGCTGGACAAACGGTTCATGGAGAAGTATGGCAAAGCCACGCCCGAGGCGCT 

30 GGTTGAGTCGGCGCTGTGGGAGGCTTCGCTTTTCGAGGAGCATGGCTTCGGTGACATCAAGAT 
CAGCGTCAAGCACAACGACCCGGTGGTGATGGTCGCCGCCTACGAGCTGCTTGCTGCACGGTG 
CGACTACCCACTGCACCTCGGTGTCACCGAGGCCGGCCCTGCTTTCCAGGGCACCATCAAGTC 
CGCGGTTGCCTTCGGCGCGTTGCTGTCGCGGGGCATAGGCGACACCATCCGGGTGTCGTTGTC 
GGCCCCGCCGGTCGAGGAAGTCAAGGTGGGCAATCAGGTTCTCGAGTCGTTGAACCTGCGGCC 

35 GCGTTCGCTCGAGATCGTGTCTTGCCCGTCGTGCGGTCGCGCGCAAGTCGACGTCTACACCCT 
GGCCAACGAGGTAACCGCCGGCCTGGATGGTCTCGATGTGCCGTTGCGGGTGGCCGTGATGG 
GGTGTGTCGTCAATGGTCCGGGTGAAGCACGTGAGGCCGACCTGGGCGTGGCGTCCGGCAAC 
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GGCAAAGGTCAGATCTTTGTACGGGGCGAAGTGATCAAGACCGTGCCCGAAGCACAGATCGTC 

GAGACGCTGATCGAGGAGGCGATGCGGCTGGCCGCCGAAATGGGCGAGCAAGATCCGGGCGC 

GACACCGAGCGGTTCGCCTATTGTGACCGTAAGCTGA 

5 >Rv2869c - TB.seq 3180548:3181759 MW:42835 >emb|AL123456|MTBH37RV:c31 81 759-31 80545. 
Rv2869c SEQIDMO:111 

ATGATGTTTGTTACCGGCATTGTGCTGTTCGCGCTCGCGATCCTGATTTCGGTGGCCCTGCACG 
AATGTGGTCACATGTGGGTCGCGCGCCGCACCGGGATGAAGGTACGTCGCTATTTCGTCGGCT 
TTGGCCCCACGTTGTGGTCGACCCGGCGCGGCGAGACCGAATACGGTGTCAAAGCCGTTCCGC 

10 TGGGCGGCTTCTGTGACATCGCCGGCATGACCCCGGTCGAGGAACTCGACCCCGACGAACGTG 
ACCGTGCGATGTACAAGCAGGCCACCTGGAAGCGGGTCGCAGTGTTATTCGCCGGGCCCGGAA 
TGAACCTCGCTATCTGCCTGGTGCTGATCTATGCCATCGCGCTGGTCTGGGGGCTGCCTAACCT 
GCATCCGCCAACCAGGGCCGTAATCGGCGAAACTGGCTGCGTTGCACAGGAAGTGAGCCAGG 
GCAAGCTCGAGCAGTGCACCGGGCCCGGTCCGGCGGCGCTGGCCGGAATTCGCTCCGGTGAC 

15 GTCGTGGTCAAGGTCGGTGACACCCCGGTGTCCAGTTTCGACGAGATGGCCGCCGCGGTGCG 
CAAGTCACACGGCAGCGTCCCGATCGTTGTCGAGCGTGACGGCACCGCGATTGTTACCTACGT 
GGACATCGAATCCACCCAACGCTGGATCCCTAACGGGCAGGGCGGTGAGCTCCAGCCGGCAAC 
GGTCGGTGCGATTGGGGTGGGCGCCGCCCGGGTCGGGCCTGTGCGCTACGGCGTGTTCTCCG 
CCATGCCGGCCACATTCGCGGTCACCGGCGACCTGACCGTGGAGGTGGGCAAGGCGCTGGCC 

20 GCCCTCCCGACCAAGGTAGGTGCGCTGGTGCGGGCGATCGGCGGCGGGCAGCGTGACCCGC 
AGACGCCGATAAGTGTGGTGGGCGCCAGCATCATCGGCGGCGACACCGTCGACCATGGGCTG 
TGGGTGGCGTTCTGGTTCTTCTTGGCCCAGCTGAACCTCATCCTGGCTGCGATCAACCTGCTGC 
CGTTGCTGCCGTTCGATGGCGGCCATATTGCCGTCGCGGTGTTCGAGAGGATCCGCAACATGG 
TCCGGTCGGCTCGTGGCAAGGTGGCGGCCGCACCGGTGAATTACCTCAAACTCTTGCCGGCGA 

25 CCTATGTGGTCTTGGTTCTTGTCGTCGGGTACATGCTCTTGACCGTCACCGCCGACCTGGTCAA 
CCCGATTAGGCTTTTCCAGTAG 

>Rv2870c - TB.seq 3181770:3183077 MW:45324 >emb|AL123456|MTBH37RV:c31 83077-31 81 767, 
Rv2870c SEQIDNO:112 

30 GTGGCTACCGGTGGACGCGTCGTGATCCGGCGGCGCGGTGACAACGAGGTGGTGGCGCACAA 
TGATGAGGTGACCAACTCGACCGACGGGCGCGCTGACGGCCGGTTGCGGGTGGTGGTGCTGG 
GCAGTACCGGCTCGATCGGCACCCAGGCGCTTCAGGTCATCGCCGACAATCCGGACCGTTTCG 
AGGTAGTCGGGCTGGCCGCTGGCGGCGCCCATCTGGACACGTTGCTGCGACAACGTGCGCAG 
ACCGGGGTGACCAATATTGCCGTCGCTGACGAGCACGCGGCGCAGCGGGTCGGCGACATCCC 

35 CTACCACGGATCCGACGCCGCCACCCGGCTGGTCGAGCAGACCGAGGCCGACGTCGTCCTCA 
ATGCGCTGGTCGGCGCGTTGGGCCTGCGACCGACGTTGGCCGCGCTCAAGACGGGTGCCCGG 
CTGGCGCTGGCCAACAAGGAATCGCTGGTCGCCGGTGGTTCGCTGGTGCTGCGGGCGGCGCG 
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GCCCGGTCAGATCGTGCCGGTCGACTCCGAACACTCCGCGCTGGCCCAGTGCCTGCGCGGCG 
GCACTCCCGACGAGGTCGCCAAGCTGGTGCTGACGGCCTCGGGAGGGCCGTTTCGGGGCTGG 
TCCGCGGCCGACCTCGAGCATGTCACCCCCGAGCAGGCTGGCGCGCATCCTACGTGGTCGATG 
GGCCCGATGAACACGCTGAATTCGGCGTCGCTGGTCAACAAGGGACTTGAGGTCATCGAAACC 
5 CACCTGCTGTTCGGCATCCCCTACGACCGCATCGATGTCGTGGTGCACCCCCAGTCGATCATCC 
ATTCGATGGTCACCTTCATCGACGGTTCGACGATCGCCCAGGCCAGTCCCCCGGACATGAAGCT 
ACCGATTTCGTTAGCGCTGGGCTGGCCGCGTCGGGTCAGCGGCGCCGCTGCTGCCTGTGATTT 
CCATACCGCGTCGAGCTGGGAGTTCGAGCCGTTGGACACCGACGTCTTCCCCGCGGTCGAGTT 
GGCCCGGCAGGCCGGCGTAGCCGGTGGCTGCATGACCGCGGTTTACAATGCGGCGAACGAAG 
10 AAGCAGCAGCGGCGTTCCTTGCTGGCCGGATCGGCTTCCCGGCCATCGTCGGCATCATCGCCG 
ACGTGTTGCACGCTGCCGACCAATGGGCCGTCGAACCCGCTACCGTGGATGACGTACTCGACG 
CGCAGCGCTGGGCCCGCGAGCGAGCGCAGCGCGCGGTATCTGGTATGGCTTCGGTGGCGATC 
GCAAGCACGGCGAAGCCGGGCGCAGCGGGTCGACACGCATCGACGTTAGAAAGGTCCTGA 

15 >Rv2922c smc member of Smc1/Cut3/Cut14 family TB.seq 3234189:3238055 MW:139610 
>emb|AL123456|MTBH37RV:c3238055-3234186, smc SEQ ID NO:113 

GTGGGTGCAGGGAGTCGGTTTCCGCTGGTGGACCCGCTGCCGAGCGTTGGAGCTCGGCCTGA 
CCGGTTACGCGGCCAACCACGCCGACGGACGCGTGCTGGTGGTCGCCCAGGGTCCGCGCGCT 
GCGTGCCAGAAGCTGCTGCAGCTGCTGCAGGGCGACACGACACCGGGCCGCGTCGCCAAAGT 

20 CGTCGCCGACTGGTCGCAGTCGACGGAGCAGATCACCGGGTTCAGCGAGCGGTAATCTGGCC 
CCTCGTGTACCTCAAGAGTCTGACGTTGAAGGGCTTCAAGTCCTTCGCCGCGCCGACGACTTTA 
CGCTTCGAGCCGGGCATTACGGCCGTCGTTGGGCCCAACGGCTCCGGCAAATCCAATGTGGTC 
GATGCCCTGGCGTGGGTGATGGGGGAGCAGGGGGCAAAGACGCTGCGCGGCGGCAAGATGG 
AAGACGTCATCTTCGCCGGCACCTCGTCGCGTGCGCCGCTGGGCCGCGCCGAAGTCACCGTTA 

25 GCATCGACAACTCCGACAACGCACTGCCTATCGAATACACCGAGGTGTCGATCACCCGAAGAAT 
GTTTCGCGACGGTGCCAGCGAATACGAAATCAACGGCAGCAGTTGCCGTTTGATGGATGTGCA 
GGAGTTGCTGAGCGACTCCGGCATCGGCCGTGAGATGCATGTGATTGTTGGGCAAGGGAAGCT 
CGAGGAGATCTTGCAGTCGCGGCCTGAGGATCGGCGGGCGTTCATCGAGGAAGCCGCCGGTG 
TGCTCAAGCATCGCAAGCGCAAGGAAAAAGCTCTGCGCAAACTCGACACGATGGCGGCGAACC 

30 TGGCCCGGCTCACCGATCTGACCACCGAGCTCCGGCGTCAACTCAAACCGCTGGGCCGGCAG 
GCCGAGGCGGCCCAGCGTGCCGCGGCCATCCAAGCCGATCTGCGCGACGCCCGGCTGCGCCT 
GGCGGCCGACGACTTGGTAAGCCGCAGAGCCGMCGGGAAGCGGTCTTTCAGGCCGAGGCTG 
CGATGCGCCGCGAGCATGACGAGGCCGCCGCCCGGCTGGCGGTGGCATCCGAGGAGCTGGC 
CGCGCATGAGTCCGCGGTCGCCGAACTCTCGACGCGGGCCGAGTCGATCCAGCACACTTGGTT 

35 CGGGCTGTCTGCGCTGGCCGAACGGGTGGACGCTACGGTGCGCATCGCCAGCGAACGCGCCC 
ATCATCTCGATATCGAGCCGGTAGCGGTCAGCGACACCGACCCCAGAAAGCCCGAGGAGCTAG 
AAGCCGAGGCCCAGCAGGTGGCCGTCGCCGAGCAACAACTGTTAGCGGAGCTGGACGCGGCG 
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CGTGCCCGACTCGATGCTGCCCGTGCAGAGCTGGCCGACCGGGAGCGCCGCGCCGCCGAGG 
CCGACCGGGCACACCTGGCGGCGGTCCGGGAGGAGGCGGACCGCCGTGAGGGACTGGCGCG 
GCTGGCTGGCCAGGTGGAGACCATGCGGGCGCGTGTCGAATCGATCGATGAGAGCGTGGCAC 
GGTTGTCCGAGCGGATCGAGGATGCCGCAATGCGCGCCCAGCAGACCCGAGCCGAGTTCGAA 
5 ACCGTGCAGGGCCGCATCGGTGAACTGGATCAAGGCGAGGTCGGCCTGGATGAGCACCACGA 
GCGTACTGTGGCCGCGTTGCGGTTGGCCGACGAACGCGTCGCCGAGCTGCAATCCGCCGAAC 
GCGCCGCCGAACGCCAGGTGGCATCGCTACGGGCTCGCATCGATGCGCTCGCAGTGGGGCTA 
CAGCGCMGGACGGCGCGGCGTGGCTGGCGCACAATCGCAGTGGCGCAGGGCTTTTCGGTTC 
GATCGCCCAATTGGTGAAGGTACGTTCCGGCTATGAAGCGGCACTGGCCGCGGCGCTCGGGC 

10 CGGCGGCCGACGCACTTGCGGTGGACGGCCTGACTGCCGCGGGTAGTGCCGTCAGCGCACTC 
AAACAAGCCGACGGCGGTCGCGCGGTCCTCGTGCTGAGTGACTGGCCGGCCCCGCAAGCCCC 
CCAATCCGCCTCGGGGGAGATGCTGCCTAGCGGCGCCCAGTGGGCCCTAGACCTGGTCGAGT 
CTCCACCGCAGTTGGTTGGCGCGATGATCGCCATGCTTTCGGGTGTCGCGGTGGTCAACGACC 
TGACTGAGGCAATGGGCCTGGTCGAGATTCGTCCGGAGCTACGCGCGGTCACCGTTGACGGTG 

15 ATCTGGTGGGCGCCGGCTGGGTCAGCGGCGGATCGGACCGCAAGCTGTCCACCTTGGAGGTC 
ACCTCCGAGATCGACAAGGCCAGGAGTGAGCTGGCCGCTGCCGAGGCGCTGGCGGCGCAATT 
GAATGCGGCCCTGGCCGGTGCGCTGACCGAGCAGTCCGCCCGCCAGGACGCGGCCGAGCAA 
GCCTTGGCCGCGCTTAACGAATCCGACACGGCCATCTCGGCGATGTACGAGCAGCTGGGCCGC 
CTCGGGCAGGAGGCCCGCGCGGCGGAAGAAGAGTGGAACCGGTTGCTGCAGCAGCGTACGGA 

20 ACAGGAAGCCGTGCGCACACAGACTCTCGACGACGTCATACAACTTGAGACCCAGCTGCGTAA 
GGCCCAGGAGACCCAACGGGTGCAGGTGGCCCAACCGATCGACCGCCAGGCGATCAGTGCCG 
CTGCCGATCGCGCCCGCGGTGTCGAAGTGGAAGCCCGGCTGGCGGTGCGCACCGCCGAGGAA 
CGCGCCAACGCGGTTCGCGGGCGGGCCGATTCGCTGCGCCGTGCGGCTGCGGCGGAACGTG 
AGGCGCGGGTGCGGGCTCAGCAAGCACGCGCCGCAAGACTGCATGCGGCCGCGGTGGCCGC 

25 AGCGGTCGCCGACTGCGGACGGCTGCTGGCCGGGCGGTTGCACCGGGCGGTGGACGGGGCG 
TCGCAACTGCGCGACGCGTCGGCCGCGCAACGTCAGCAGCGGTTAGCGGCGATGGCCGCGGT 
GCGCGACGAGGTGAACACGCTGAGCGCCCGAGTGGGGGAACTCACCGATTCGCTGCACCGCG 
ACGAGCTGGCTAACGCGCAGGCGGCGCTGCGTATCGAGCAGCTTGAGCAGATGGTGCTAGAG 
CAGTTCGGAATGGCGCCGGCCGACTTGATCACCGAATACGGTCCACATGTGGCGCTACCACCG 

30 ACCGAGCTCGAGATGGCTGAGTTCGAGCAAGCCCGCGAACGCGGCGAGCAGGTGATTGCGCC 
CGCCCCCATGCCGTTCGACCGGGTTACCCAGGAGCGCCGGGCCAAACGCGCCGAGCGTGCGC 
TTGCCGAGTTGGGCAGGGTCAACCCGCTGGCGCTCGAAGAGTTTGCTGCCTTGGAGGAGCGCT 
ACAATTTCCTGTCCACCCAACTCGAGGATGTCAAGGCTGCCCGCAAGGATCTGCTGGGCGTCGT 
CGCCGATGTTGACGCCCGCATCCTGCAGGTGTTCAATGACGCGTTCGTAGACGTGGAACGCGA 

35 ATTTCGCGGCGTGTTCACCGCATTGTTCCCCGGTGGTGAAGGACGGCTGCGGCTGACCGAGCC 
CGACGACATGCTCACCACCGGCATCGAGGTCGAAGCCCGCCCGCCGGGCAAGAAGATTACCC 
GACTGTCTTTGCTCTCCGGTGGCGAGAAGGCGCTGACCGCGGTGGCGATGCTGGTCGCGATCT 
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TTCGTGCCCGTCCATCGCCGTTCTACATCATGGACGAGGTGGAGGCCGCCCTCGACGACGTGA 
ACCTGCGCCGACTGCTCAGCCTGTTCGAACAGCTGCGAGAGCAGTCGCAGATCATCATCATCAC 
CCACCAGAAGCCGACGATGGAGGTCGCGGACGCACTGTACGGCGTAACCATGCAGAACGACG 
GCATCACCGCGGTCATCTCGCAGCGCATGCGCGGTCAGCAGGTGGATCAGCTGGTTACCAATT 
5 CCTCGTAG 

>Rv2925c rnc RNAse III TB.seq 3239829:3240548 MW:25400 
>emb|AL123456|MTBH37RV:c3240548-3239826, mc SEQ ID NO:1 14 

ATGATCCGGTCACGACAACCCCTGCTCGACGCACTCGGTGTGGACCTCCCGGACGAGCTGCTC 
10 TCACTGGCGTTGACCCACCGCAGCTACGCCTACGAGAACGGCGGGCTGCCGACCAACGAGCGT 
TTGGAGTTTCTCGGCGATGCCGTGCTAGGGCTGACCATCACCGACGCGCTGTTCCATCGTCATC 
CTGATCGGTCGGAGGGGGATCTGGCCAAACTGCGGGCCAGCGTAGTCAACACCCAGGCCCTG 
GCCGACGTCGCACGCCGCCTCTGTGCGGAAGGCCTCGGTGTTCACGTGCTATTGGGTCGCGGC 
GAGGCGAACACCGGCGGGGCCGACAAGTCCAGCATTCTGGCCGACGGTATGGAATCGCTGCT 
15 GGGCGCGATCTACCTGCAACACGGTATGGAGAAGGCCCGTGAGGTGATCCTGCGGCTGTTTGG 
CCCGTTGCTGGACGCCGCGCCGACCCTGGGTGCGGGATTGGATTGGAAGACCAGCTTGCAGG 
AGCTGACTGCAGCGCGAGGGCTGGGTGCGCCGTCATACCTGGTCACCTCCACCGGCCCGGAC 
CACGATAAGGAATTCACCGCGGTGGTTGTCGTGATGGACAGCGAATACGGTTCAGGAGTGGGC 
CGGTCCAAAAAAGAAGCCGAGCAAAAAGCCGCGGCGGCCGCTTGGAAAGCCCTGGAAGTGCTC 
20 GACAACGCCATGCCGGGCAAAACCTCCGCCTAA 

>Rv2934 ppsD TB.seq 3262245:3267725 MW:193317 
>emb|AL123456|MTBH37RV:3262245-3267728, ppsD SEQ ID NO:1 15 

ATGACAAGTCTGGCGGAGCGCGCGGCGCAACTGTCGCCGAACGCGCGAGCGGCCCTGGCGCG 
25 CGAGCTCGTCCGTGCGGGTACGACCTTCCCGACCGACATCTGCGAGCCGGTGGCGGTGGTGG 
GCATCGGCTGTCGCTTTCCGGGGAATGTGACTGGGCCAGAGAGCTTTTGGCAGCTACTGGCCG 
ACGGTGTGGACACAATCGAGCAGGTGCCGCCTGATCGGTGGGATGCGGACGCGTTCTACGATC 
CCGATCCTTCGGCGTCGGGTCGGATGACGACGAAATGGGGTGGTTTCGTTTCCGATGTCGACG 
CGTTCGACGCCGACTTTTTCGGAATCACTCCTCGGGAAGCCGTGGCGATGGACCCGCAGCATC 
30 GGATGCTGCTCGAGGTTGCCTGGGAAGCGTTGGAGCACGCGGGTATTCCGCCGGATTCCTTGA 
GCGGCACTCGAACCGGCGTGATGATGGGTCTGTCGTCGTGGGACTACACGATCGTCAATATCG 
AGCGCAGAGCCGACATCGACGCGTACCTGAGCACCGGAACCCCGCACTGTGCCGCGGTGGGG 
CGGATCGCGTATCTGTTGGGATTGCGTGGTCCGGCCGTCGCCGTAGATACCGCTTGTTCGTCGT 
CGCTGGTGGCAATTCACTTGGCGTGTCAGAGCCTTCGCCTGCGTGAAACCGACGTGGCATTGG 
35 CGGGCGGGGTGCAGCTCACCTTGTCACCGTTCACCGCCATCGCGCTGTCCAAGTGGTCGGCGC 
TGTCACCGACCGGCCGATGCAACAGCTTCGACGCCAACGCGGATGGATTCGTGCGCGGCGAG 
GGCTGCGGCGTGGTGGTGCTCAAGCGGTTGGCCGACGCGGTGCGCGACCAGGACCGGGTGCT 
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TGCGGTGGTCCGCGGTTCGGCAACTAACTCCGATGGTCGGTCCAACGGCATGACCGCACCGAA 
CGCGCTGGCGCAGCGTGACGTGATCACATCCGCCCTCAAGCTTGCGGATGTTACCCCTGACAG 
CGTGAACTATGTCGAAACACACGGCACCGGAACGGTGTTGGGGGACCCCATCGAGTTCGAGTC 
GCTGGCGGCCACTTATGGCCTGGGTAAAGGCCAGGGCGAGAGCCCGTGCGCATTGGGGTCGG 
5 TCAAGACCAACATCGGCCACCTGGAGGCGGCCGCCGGTGTGGCTGGATTCATCAAGGCGGTGC 
TGGCGGTGCAACGTGGGCACATTCCCCGCAACTTGCACTTCACCCGGTGGAACCCGGCCATCG 
ACGCGTCGGCGACGCGGCTGTTCGTGCCGACCGAAAGCGCCCCGTGGCCGGCGGCTGCCGGT 
CCACGCAGGGCTGCGGTGTCATCGTTCGGCCTCAGCGGGACCAACGCGCACGTGGTGGTCGA 
GCAGGCACCCGACACCGCAGTAGCCGCAGCCGGCGGCATGCCGTATGTTTCGGCGCTGAACG 

10 TCTCCGGCAAGACGGCCGCGCGGGTGGCGTCGGCGGCGGCGGTGCTGGCCGACTGGATGTC 
GGGGCCGGGCGCGGCGGCACCACTGGCCGACGTGGCACACACGTTGAACCGGCACCGGGCC 
CGGCACGCCAAGTTCGCCACCGTCATCGCGCGTGACCGCGCCGAGGCGATCGCGGGGTTGCG 
AGCGCTGGCGGCCGGACAACCACGCGTTGGGGTGGTGGATTGCGACCAGCATGCCGGTGGGC 
CTGGCCGGG I I 1 1 IGTGTATTCGGGTCAGGGCTCGCAGTGGGCGTCGATGGGCCAGCAGTTGC 

15 TGGCCAACGAACCGGCGTTCGCCAAGGCGGTAGCCGAGCTGGATCCGATATTCGTTGACCAGG 
TTGGCTTTTCGCTGCAGCAAACGCTTATCGACGGCGACGAGGTGGTGGGCATCGACCGCATCC 
AGCCGGTGCTGGTCGGGATGCAGTTGGCGCTGACCGAGTTATGGCGGTCCTATGGGGTGATTC 
CAGATGCCGTGATCGGGCACTCGATGGGTGAGGTGTCGGCGGCAGTGGTGGCCGGCGCGTTG 
ACGCCCGAGCAGGGCTTGCGGGTCATCACCACCCGGTCGCGGTTGATGGCGCGGCTGTCGGG 

20 GCAGGGAGCGATGGCGCTGCTCGAGCTGGATGCCGACGCCGCCGAGGCGCTGATTGCCGGCT 
ATCCGCAGGTGACGCTGGCGGTGCATGCGTCACCGCGCCAGACGGTGATCGCCGGGCCGCCC 
GAGCAGGTGGACACGGTGATCGCGGCGGTAGCGACGCAAAACCGGTTGGCGCGCCGCGTCGA 
AGTCGACGTGGCCTCCCATCACCCGATCATCGATCCCATACTGCCCGAGTTGCGAAGCGCGTTA 
GCGGATTTGACTCCGCAGCCGCCGAGCATCCCGATCATTTCCACTACGTACGAAAGCGCGCAG 

25 CCGGTGGCGGATGCCGACTATTGGTCGGCCAACCTGCGCAACCCGGTGCGATTCCACCAGGCC 
GTCACCGCCGCCGGTGTCGACCACAACACCTTCATCGAAATCAGCCCTCACCCCGTGCTCACG 
CACGCACTCACCGACACCCTGGATCCGGACGGCAGCCATACAGTCATGTCGACGATGAACCGC 
GAACTGGACCAGACGCTGTATTTCCACGCCCAACTCGCCGCGGTCGGTGTGGCTGCGTCCGAG 
CACACCACCGGTCGCCTTGTCGACCTGCCCCCCACACCGTGGCACCATCAGCGATTCTGGGTC 

30 ACGGATCGTTCGGCGATGTCCGAGCTGGCCGCGACCCACCCGCTCCTGGGCGCGCACATCGA 
GATGCCGCGCAACGGAGACCATGTCTGGCAGACCGATGTCGGCACCGAGGTCTGTCCCTGGTT 
GGCAGACCACAAGGTGTTCGGTCAACCCATCATGCCGGCCGCGGGGTTCGCCGAGATCGCCTT 
GGCGGCGGCCAGCGAAGCCCTCGGCACAGCCGCCGACGCCGTCGCACCCAACATCGTGATCA 
ACCAGTTCGAGGTGGAGCAGATGCTGCCCCTCGACGGCCACACGCCGCTAACGACGCAGTTAA 

35 TTCGCGGCGGGGACAGCCAGATTCGGGTCGAGATCTATTCCCGCACGCGTGGCGGAGAGTTCT 
GCCGACACGCCACGGCCAAGGTTGAACAATCGCCGCGCGAATGTGCGCACGCGCACCCGGAA 
GCCCAAGGTCCCGCCACCGGGACAACAGTGTCGCCGGCCGATTTTTATGCCCTGCTCCGCCAA 
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ACCGGCCAACACCATGGTCCGGCGTTCGCGGCCTTAAGCCGGATCGTGC6CCTGGCCGATGGT 
TCCGCGGAAACCGAGATCAGCATTCCCGACGAGGCGCCGCGCCATCCCGGGTATCGGCTGCA 
CCCCGTGGTATTGGATGCGGCATTGCAAAGCGTGGGTGCCGCGATACCCGACGGCGAGATCGC 
GGGGTCGGCGGAAGCCAGCTATCTGCCAGTGTCGTTCGAGACCATCCGGGTGTACCGCGACAT 
5 CGGTCGGCACGTCAGGTGTCGTGCCCACCTGACAAACCTCGACGGCGGCACCGGAAAGATGG 
GCAGGATCGTCCTAATCAACGACGCCGGCCACATAGCGGCCGAAGTGGACGGCATCTATCTGC 
GTCGTGTCGAACGCCGTGCGGTACCCCTGCCACTAGAGCAGAAGATCTTCGATGCCGAATGGA 
CCGAAAGCCCGATCGCAGCCGTGCCGGCTCCGGAGCCAGCTGCCGAGACGACGCGGGGAAGT 
TGGCTGGTACTCGCCGATGCAACGGTGGATGCGCCAGGCAAGGCCCAGGCCAAGTCGATGGC 

10 CGACGACTTCGTGCAGCAGTGGCGCTCACCGATGCGGCGGGTGCACACCGCCGATATCCACGA 
CGAATCGGCGGTGCTGGCCGCATTTGCAGAAACGGCAGGCGATCCCGAGCACCCGCCGGTTG 
GCGTGGTGGTGTTCGTCGGCGGTGCCTCGAGTCGACTGGACGACGAGCTGGCGGCGGCGCGC 
GACACGGTGTGGTCGATCACCACGGTGGTTCGTGCGGTCGTCGGCACGTGGCACGGCCGATCA 
CCGCGGCTATGGCTGGTCACCGGGGGCGGACTTTCCGTTGCCGACGACGAGCCGGGAACACC 

15 CGCGGCGGCTTCCTTGAAAGGGCTGGTGCGGGTGCTCGCCTTCGAGCACCCGGACATGCGCA 
CCACCCTGGTCGATCTGGACATCACACAAGACCCGCTGACCGCGCTGAGCGCGGAACTGCGGA 
ATGCCGGGAGTGGGTCGCGCCATGATGACGTGATCGCGTGGCGCGGCGAGCGCAGGTTCGTC 
GAACGGCTGTCGCGCGCCACGATCGATGTATCCAAAGGGCATCCGGTGGTGCGCCAGGGAGC 
GTCGTACGTCGTCACCGGCGGCCTCGGCGGTCTCGGCCTGGTCGTCGCTCGTTGGCTGGTGG 

20 ACCGCGGCGCCGGCCGGGTGGTGCTGGGTGGCCGCAGCGATCCCACTGACGAGCAGTGCAAC 
GTCCTGGCCGAACTGCAGACCCGCGCCGAGATCGTGGTTGTCCGTGGCGACGTGGCATCGCC 
GGGGGTGGCAGAAAAGCTGATTGAGACGGCCCGACAGTCTGGGGGCCAATTGCGCGGCGTCG 
TGCACGCCGCCGCGGTCATCGAAGACAGCCTGGTGTTCTCTATGAGCAGGGACAACCTAGAAC 
GGGTGTGGGCACCCAAGGCCACCGGTGCGCTGCGCATGCACGAAGCCACCGCTGACTGCGAG 

25 CTCGACTGGTGGCTCGGATTCTCTTCCGCCGCTTCGCTATTGGGTTCTCCCGGGCAAGCGGCCT 
ACGCGTGCGCCAGCGCGTGGCTGGACGCGCTGGTCGGATGGCGCAGGGCATCCGGCCTGCC 
GGCCGCGGTGATCAACTGGGGTCCGTGGTCGGAGGTAGGCGTCGCCCAGGCCTTGGTGGGCA * 
GTGTTCTCGACACGATCAGTGTCGCAGAAGGCATCGAGGCTCTCGACTCATTGCTTGCCGCCGA 
CCGGATCCGCACTGGAGTGGCTCGGCTGCGTGCCGATCGGGCCCTGGTCGCATTCCCGGAGA 

30 TCCGCAGCATCAGCTACTTCACCCAGGTGGTCGAGGAGCTGGACTCGGCGGGTGACCTCGGCG 
ACTGGGGCGGGCCCGACGCGCTTGCCGACCTCGACCCGGGCGAGGCGCGGCGCGCGGTGAC 
CGAGCGGATGTGTGCGCGCATCGCTGCGGTGATGGGCTACACTGACCAGTCGACTGTCGAACC 
CGCCGTGCCCTTGGACAAGCCCCTGACCGAGCTGGGGCTGGATTCTCTGATGGCGGTACGAAT 
ACGCAACGGCGCGCGGGCGGATTTCGGCGTGGAACCGCCGGTAGCGCTGATACTGCAAGGCG 

35 CGTCCTTGCATGACCTGACGGCGGACTTAATGCGCCAACTCGGGCTCAATGATCCCGATCCGG 
CGCTCAACAACGCTGACACTATTCGCGACCGGGCGCGCCAGCGCGCGGCAGCGCGACACGGA 
GCCGCGATGCGGCGCCGACCTAAACCTGAAGTACAGGGAGGATAA 
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>Rv2946c pksl TB.seq 3291503:3296350 MW:166642 
>emb|AL123456|MTBH37RV:c3296350-3291500, pksl SEQ ID N0:116 

GTGATTTCGGCGAGATCGGCTGAGGCGTTGACGGCGCAGGCGGGTCGACTTATGGCCCACGTG 
5 CAGGCCAACCCAGGGCTGGATCCGATCGATGTGGGGTGCTCGTTGGCCAGTCGCTCGGTGTTT 
GAGCACCGAGCGGTGGTGGTCGGCGCAAGCCGTGAGCAACTGATTGCCGGGCTGGCTGGGCT 
CGCGGCGGGCGAGCCGGGTGCCGGCGTGGCGGTCGGTCAGCCAGGGTCGGTGGGCAAGACG 
GTGGTCGTGTTTCCTGGGCAGGGCGCGCAGCGCATCGGGATGGGCCGCGAGTTGTACGGCGA 
GTTGCCCGTGTTTGCGCAGGCATTCGATGCGGTGGCCGACGAGTTGGACCGGCATCTGCGGTT 

10 GCCGCTGCGCGACGTTATTTGGGGTGCCGATGCGGATTTGCTTGACAGCACCGAATTTGCTCAG 
CCCGCGTTGTTCGCGGTGGAGGTGGCATCGTTCGCGGTGTTGCGGGATTGGGGTGTGCTTCCG 
GACTTCGTCATGGGTCACTCCGTTGGAGAGCTGGCGGCGGCGCACGCGGCCGGTGTGTTGAC 
GTTGGCGGACGCGGCGATGCTGGTGGTGGCGCGGGGCCGGTTGATGCAGGCGCTGCCGGCA 
GGCGGTGCGATGGTGGCGGTGGCTGCCAGTGAGGACGAGGTGGAGCCGCTGCTGGGTGAGG 

15 GTGTGGGGATCGCTGCGATCAACGCGCCCGAATCGGTGGTGATCTCCGGTGCGCAGGCCGCG 
GCAAATGCGATTGCGGATCGGTTCGCCGCGCAGGGTCGGCGGGTGCACCAGTTGGCGGTCTC 
GCATGCGTTTCATTCGCCGTTGATGGAGCCGATGCTCGAGGAGTTCGCGCGTGTCGCGGCCCG 
GGTGCAGGCACGCGAGCCCCAGCTTGGGCTGGTGTCGAACGTGACGGGCGAGTTGGCCGGCC 
CTGATTTCGGGTCGGCGCAGTACTGGGTGGACCACGTTCGTCGGCCGGTGCGCTTCGCGGACA 

20 GTGCGCGTCATTTGCAGACCCTTGGGGCGACCCACTTCATCGAGGCCGGCCCGGGAAGTGGTT 
TGACTGGCTCGATCGAGCAGTCCTTGGCCCCGGCTGAGGCGATGGTGGTGTCGATGCTGGGCA 
AAGACCGGCCCGAGCTGGCCTCGGCGCTCGGTGCTGCCGGTCAGGTGTTCACCACCGGTGTG 
CCGGTGCAGTGGTCGGCGGTGTTCGCCGGCTCGGGTGGACGGCGGGTGCAGCTGCCCACGTA 
TGCGTTTCAGCGACGGCGGTTTTGGGAGACGCCGGGCGCGGATGGGCCCGCCGATGCGGCCG 

25 GGTTGGGTCTGGGCGCGACCGAGCATGCCTTGTTGGGTGCGGTGGTCGAGCGGCCCGATTCT 
GACGAGGTGGTGCTGACCGGCCGGTTGTCGCTTGCGGATCAGCCGTGGCTGGCCGACCACGT 
GGTGAACGGGGTGGTGCTGTTCCCCGGGGCGGGTTTTGTGGAGTTGGTGATCCGCGCCGGTG 
ATGAGGTCGGGTGCGCGCTCATCGAAGAGTTGGTGCTGGCCGCACCGTTGGTGATGCACCCGG 
GTGTCGGGGTTCAGGTGCAGGTGGTCGTCGGGGCTGCCGATGAATCCGGGCACCGTGCGGTG 

30 TCGGTGTATTCCCGCGGTGATCAATCCCAGGGTTGGTTGCTGAACGCCGAAGGCATGCTGGGG 
GTGGCTGCCGCTGAGACGCCGATGGATTTGTCCGTGTGGCCGCCCGAGGGCGCGGAGAGTGT 
GGATATCTCGGACGGCTATGCGCAGTTGGCCGAGCGCGGTTATGCCTACGGCCCCGCGTTTCA 
GGGTCTGGTGGCGATCTGGCGGCGGGGGTCGGAGCTGTTCGCCGAAGTTGTAGCCCCCGGCG 
AGGCCGGCGTGGCCGTCGACCGAATGGGGATGCATCCGGCGGTGTTGGACGCGGTGCTGCAT 

35 GCCCTCGGGCTGGCCGTCGAGAAGACCCAGGCGAGCACCGAGACGAGACTGCCGTTTTGCTG 
GCGTGGGGTGTCGCTGCATGCCGGCGGCGCTGGACGGGTGCGGGCCCGCTTCGCGTCCGCG 
GGCGCGGATGCGATTTCCGTGGACGTCTGCGACGCCACTGGGCTGCCGGTGTTGACGGTGCG 
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CTCGCTGGTTACTCGCCCGATAACCGCAGAACAGCTGCGCGCCGCCGTGACCGCGGCCGGCG 
GTGCGTCCGATCAGGGGCCGCTGGAAGTGGTGTGGTCGCCGATCTCGGTGGTCAGCGGCGGC 
GCTAACGGGTCCGCCCCACCTGCCCCGGTGTCTTGGGCGGACTTTTGCGCCGGCAGTGATGGT 
GACGCCAGTGTCGTGGTGTGGGAACTCGAGTCTGCCGGTGGCCAAGCATCCTCGGTGGTGGG 
5 CTCGGTGTATGCGGCCACCCACACCGCCCTGGAGGTGTTGCAGTCCTGGCTCGGCGCGGATCG 
GGCGGCCACGTTGGTGGTGTTGACCCATGGTGGCGTGGGGCTGGCTGGCGAGGACATCAGCG 
ACCTGGCCGCCGCCGCGGTGTGGGGCATGGCGCGTTCCGCGCAGGCCGAAAATCCCGGCCG 
GATCGTGTTGATCGACACCGATGCGGCGGTGGATGCCTCGGTGCTAGCCGGCGTCGGGGAAC 
CCCAGCTGCTGGTGCGCGGCGGCACTGTGCACGCCCCCCGGCTGTCCCCGGCCCCGGCGTTG 

10 CTAGCGTTACCGGCGGCAGAGTCGGCGTGGCGATTGGCCGCCGGTGGTGGCGGGACCCTGGA 
GGATTTGGTGATCCAGCCCTGCCCGGAGGTACAGGCACCGCTACAGGCGGGGCAGGTGCGCG 
TGGCGGTGGCGGCCGTCGGGGTCAACTTCCGCGATGTGGTGGCCGCCCTAGGGATGTATCCC 
GGCCAGGCCCCACCGCTGGGTGCCGAAGGCGCCGGGGTGGTGCTTGAGACCGGTCCCGAAGT 
GACCGATCTTGCCGTCGGTGACGCCGTGATGGGATTCCTGGGCGGGGCCGGTCCGCTGGCGG 

15 TGGTGGATCAGCAACTGGTTACCCGGGTGCCGCAAGGCTGGTCGTTTGCtCAGGCAGCCGCTG 
TGCCGGTGGTGTTCTTGACGGCCTGGTACGGGTTGGCCGATTTAGCCGAGATCAAGGCGGGCG 
AATCGGTGCTGATCCATGCCGGTACCGGCGGTGTGGGCATGGCGGCTGTGCAGCTGGCTCGC 
CAGTGGGGCGTGGAGGTTTTCGTCACCGCCAGCCGTGGCAAGTGGGACACGCTGCGCGCCAT 
GGGGTTTGACGACGACCATATCGGCGATTCCCGCACATGCGAGTTCGAGGAGAAGTTCCTGGC 

20 GGTCACCGAGGGCCGCGGGGTTGATGTGGTGCTCGACTCGCTGGCCGGTGAGTTCGTGGATG 
CGTCGCTGCGCTTACTGGTCCGCGGTGGGCGTTTCCTCGAGATGGGCAAGACGGATATCCGCG 
ATGCGCAGGAGATCGCCGCTAATTATCCCGGCGTGCAGTATCGGGCGTTCGACCTGTCGGAGG 
CCGGCCCGGCACGCATGCAGGAGATGTTGGCCGAGGTGCGGGAGCTGTTCGACACCCGGGAG 
CTGCACCGGCTACCGGTCACCACGTGGGATGTGCGCTGCGCCCCGGCGGCCTTCCGGTTCATG 

25 AGCCAGGCCCGCCATATCGGCAAGGTTGTCTTAACCATGCCCTCGGCGTTGGCCGACCGGCTT 
GCCGACGGCACGGTGGTGATCACCGGTGCCACCGGGGCGGTTGGTGGGGTGTTGGCCCGCCA 
CCTGGTTGGCGCCTATGGGGTGCGTCATCTGGTGTTGGCCAGTCGGCGGGGCGATCGCGCGG 
AGGGAGCGGCCGAATTGGCCGCCGACTTGACGGAGGCCGGCGCCAAGGTGCAGGTGGTGGC 
CTGTGACGTGGCCGATCGCGCTGCGGTAGCGGGGTTGTTTGCCCAGCTGTCGCGGGAGTACCC 

30 GCCGGTGCGCGGGGTGATTCATGCCGCCGGCGTGCTCGATGACGCAGTGATCACCTCGTTGAC 
ACCGGACCGCATCGATACGGTGTTGCGGGCCAAGGTGGACGCGGCGTGGAACCTGCACCAGG 
CCACCAGTGACCTGGATTTGTCGATGTTTGCGCTGTGCTCATCGATCGCGGCCACGGTCGGCTC 
GCCGGGGCAGGGCAACTACTCGGCGGCAAACGCGTTTCTGGACGGGTTGGCCGCTCACCGGC 
AGGCCGCAGGGTTGGCGGGGATATCACTGGCGTGGGGTTTGTGGGAACAGCCTGGCGGCATG 

35 ACCGCGCATTTGAGCAGCCGAGATCTGGCCCGCATGAGCCGCAGCGGGCTGGCTCCGATGAG 
CCCTGCCGAAGCGGTGGAATTGTTTGACGCTGCGCTGGCCATCGATCACCCTCTGGCGGTGGC 
CACGCTCTTGGACCGGGCTGCACTAGACGCCCGGGCCCAGGCCGGTGCGTTGCCGGCGCTGT 
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TCAGCGGGCTCGCGCGCCGCCCACGCCGACGCCAAATCGACGACACCGGTGACGCCACCTCG 
TCGAAGTCGGCGCTGGCTCAACGCCTACACGGGCTGGCCGCGGACGAACAACTCGAGCTGCTA 
GTGGGGCTGGTGTGTCTGCAGGCAGCGGCAGTGCTGGGTAGGCCCTCCGCCGAGGACGTCGA 
CCCCGACACCGAATTCGGCGACCTCGGTTTCGACTCATTAACGGCTGTGGAGTTACGCAACCGC 
5 CTCAAAACCGCCACCGGACTGACGCTGCCACCTACCGTGATTTTCGATCATCCCACTCCCACTG 
CGGTCGCCGAGTATGTCGCCCAGCAAATGTCTGGCAGCCGCCCAACGGAATCCGGTGATCCGA 
CGTCGCAGGTTGTCGAACCCGCCGCCGCGGAAGTATCGGTCCATGCCTAG 

>Rv3014c ligA DNA ligase TB.seq 3372545:3374617 MW:75258 

10 >emb|AL123456|MTBH37RV:c3374617-3372542, ligA SEQ ID NO:117 

GTGAGCTCCCCAGACGCCGATCAGACCGCTCCCGAGGTGTTGCGGCAGTGGCAGGCACTGGC 
CGAGGAGGTGCGTGAGCACCAGTTCCGTTATTACGTGCGGGACGCGCCGATCATCAGCGACGC 
GGAATTCGACGAGCTGCTGCGCCGTCTGGAAGCCCTCGAGGAGCAGCATCCCGAGCTGCGCA 
CGCCCGATTCGCCGACCCAGCTGGTCGGCGGTGCCGGCTTCGCCACGGATTTCGAGCCCGTC 

15 GACCATCTCGAACGAATGCTCAGCCTCGACAACGCGTTCACCGCCGACGAACTCGCCGCCTGG 
GCCGGCCGCATCCATGCCGAGGTCGGAGACGCCGCACATTACCTGTGTGAGCTCAAGATCGAC 
GGCGTCGCGCTGTCTTTGGTCTACCGCGAGGGACGGCTGACCCGGGCCTCCACCCGCGGCGA 
CGGGCGCACCGGCGAGGACGTCACCCTGAACGCCCGGACCATCGCCGACGTTCCCGAACGGC 
TCACCCCCGGCGACGACTACCCGGTGCCCGAGGTCCTCGAGGTCCGCGGCGAGGTCTTCTTCC 

20 GGCTGGACGACTTCGAGGCGCTCAACGCCAGCCTCGTCGAGGAGGGCAAGGCGCCGTTCGCC 
AACCCCCGCAACAGCGCGGCGGGATCGCTGCGCCAGAAAGACCCGGCGGTCACCGCGCGCCG 
CCGGCTGCGGATGATCTGCCACGGGCTGGGCCACGTGGAGGGCTTTCGCCCGGCCACCCTGC 
ATCAGGCATACCTGGCGTTGCGGGCATGGGGACTGCCGGTTTCCGAACACACCACCCTGGCAA 
CCGACCTGGCCGGTGTGCGCGAGCGCATCGACTACTGGGGCGAGCACCGCCACGAGGTGGAC 

25 CACGAAATCGACGGCGTGGTGGTCAAAGTCGACGAGGTGGCGTTGCAGCGCAGGCTGGGTTC 
CACGTCGCGGGCGCCGCGCTGGGCCATCGCCTAGAAGTACCCGCCCGAGGAAGCGCAGACCA 
AGCTGCTCGACATCCGGGTGAACGTCGGCCGCACCGGGCGGATCACGCCGTTTGCGTTCATGA 
CGCCGGTGAAGGTGGCCGGGTCGACGGTGGGACAGGCCACCCTGCACAACGCCTCGGAGATC 
AAGCGCAAGGGCGTGCTGATCGGCGACACCGTGGTGATCCGCAAGGCCGGCGACGTGATCCC 

30 CGAGGTGCTGGGACCCGTCGTCGAACTGCGCGATGGCTCCGAACGCGAATTCATCATGCCCAC 
CACCTGCCCGGAGTGCGGTTCGCCGTTGGCGCCGGAGAAGGAAGGCGACGCCGACATCCGTT 
GCCCCAACGCCCGCGGCTGCCCGGGGCAACTGCGGGAGCGGGTTTTCCACGTCGCCAGCCGC 
AACGGCCTAGACATCGAGGTGCTCGGTTACGAGGCGGGTGTGGCGCTCTTGCAGGCGAAGGT 
GATCGCCGACGAGGGCGAGCTGTTCGCGCTGACCGAGCGGGACTTGCTGCGCACCGACCTGT 

35 TCCGAACCAAGGCAGGCGAACTGTCGGCCAACGGCAAACGGCTGCTGGTCAACCTCGACAAGG 
CCAAGGCGGCACCGCTGTGGCGGGTGCTGGTGGCGCTGTCCATCCGCCATGTCGGGCCGACG 
GCGGCCCGCGCCCTGGCCACCGAGTTCGGCAGCCTTGACGCCATCGCCGCGGCGTCCACCGA 
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CCAGCTGGCCGCCGTCGAGGGGGTGGGGCCGACCATTGCCGCCGCGGTCACCGAGTGGTTCG 
CCGTCGACTGGCACCGCGAGATCGTCGACAAGTGGCGGGCCGCCGGGGTGCGAATGGTCGAC 
GAGCGTGACGAGAGTGTGCCACGCACGCTGGCCGGGCTGACCATCGTGGTCACCGGCTCGCT 
GACCGGTTTCTCCCGCGACGACGCCAAGGAGGCGATCGTGGCCCGCGGCGGCAAGGCCGCCG 
5 GCTCGGTGTCGAAGAAGACCAACTATGTCGTCGCCGGAGACTCGCCGGGATCCAAATACGACA 
AGGCGGTGGAGTTGGGGGTGCCGATTCTGGACGAGGATGGGTTCCGGAGACTGCTGGCCGAC 
GGACCCGCGTCACGAACGTAA 

>Rv3025c - NifS-like protein TB.seq 3383885:3385063 MW:40948 

10 >emb|AL123456|MTBH37RV:c3385063-3383882, Rv3025c SEQ ID NO:118 

ATGGCCTACCTGGATCACGCTGCCACCACCCCGATGCACCCCGCCGCCATCGAGGCGATGGCG 
GCCGTGCAGCGCACCATCGGCAATGCGTCGTCGCTGCACACCAGCGGGCGCTCGGCGCGCCG 
GCGGATCGAGGAGGCCCGTGAGCTGATCGCGGACAAGCTAGGCGCTCGTCCGTCCGAGGTGA 
TCTTCACCGCGGGCGGCACCGAAAGCGACAACCTGGCTGTCAAAGGTATCTATTGGGCACGCC 

15 GCGATGCGGAGCCGCACCGCCGTCGCATCGTCACCACCGAGGTGGAACACCACGCCGTACTG 
GACTCGGTGAACTGGCTCGTGGAACACGAAGGCGCCCATGTGACCTGGCTGCCGACCGCCGC 
CGACGGCTCGGTGTCGGCAACTGCGCTGCGCGAGGCACTGCAGAGCCACGACGACGTCGCGC 
TGGTATCGGTGATGTGGGCCAACAACGAGGTCGGAACTATTCTACCGATCGCCGAAATGTCAGT 
TGTCGCCATGGAATTCGGCGTGCCGATGCACAGTGATGCCATTCAGGCGGTGGGACAGCTCCC 

20 GCTTGACTTCGGGGCCAGCGGGCTGTCGGCGATGAGCGTGGCCGGGCACAAATTCGGTGGCC 
CGCCAGGAGTGGGTGCGTTGCTGCTGCGCCGCGACGTCACCTGCGTGCCCCTTATGCACGGC 
GGTGGGCAGGAGCGCGATATTCGTTCCGGCACACCCGATGTCGCCAGTGCAGTTGGAATGGCG 
ACGGCCGCGCAGATCGCGGTGGACGGACTCGAGGAAAACAGCGCGCGGTTACGGCTGCTGCG 
GGATCGTCTGGTCGAGGGTGTGCTGGCTGAGATTGACGATGTTTGCCTTAACGGCGCCGATGA 

25 CCCGATGCGGCTAGCGGGTAACGCGCACTTCACTTTCCGTGGCTGCGAAGGCGATGCGCTGTT 
GATGTTGTTGGACGCTAACGGAATCGAGTGCTCAACCGGATCGGCCTGCACGGCAGGTGTAGC 
GCAGCCCTCGCATGTGTTGATTGCAATGGGCGTCGACGCGGCCAGCGCCCGCGGATCATTGCG 
TCTCTCGCTGGGGCACACCAGTGTTGAGGCTGATGTCGATGCCGCGTTGGAGGTGCTTCCCGG 
GGCGGTGGCACGTGCACGGCGGGCCGCCCTAGCCGCCGCGGGAGCATCCCGATGA 

30 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW: 1 1 9420 
>emb|AL123456|MTBH37RV:c3445985-3442653, pknK SEQ ID NO:119 

ATGACCGACGTTGATCCGCACGCGACGCGGCGGGACCTGGTCCCGAATATTCCCGCGGAACTG 
CTTGAGGCTGGATTCGACAATGTCGAGGAGATCGGGCGCGGCGGATTCGGCGTCGTCTACCGC 
35 TGCGTCCAGCCCTCGCTGGACCGCGCCGTCGCCGTCAAGGTATTGAGCACCGACCTGGATCGG 
GACAATCTCGAGCGCTTCCTGCGCGAGCAGCGGGCCATGGGCCGCCTTTCCGGGCACCCGCA 
CATCGTGACCGTCTTGCAGGTGGGCGTGTTGGCGGGTGGGCGGCCCTTCATCGTGATGCCCTA 
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CCACGCCAAGAATTCGTTGGAGACGCTGATTCGCCGGCACGGGCCGCTGGACTGGCGCGAGA 
CGCTGTCGATCGGCGTCAAGCTCGCGGGAGCGCTGGAAGCCGCGCATCGCGTCGGCACCCTG 
CACCGTGACGTGAAGCCGGGGAATATCCTGCTGACCGACTACGGGGAACCGCAGCTGACCGAT 
TTCGGAATCGCCAGAATCGCCGGGGGTTTCGAGACGGCGACCGGGGTGATTGCCGGTTCCCCG 
5 GCTTTCACCGCGCCGGAAGTTCTCGAAGGAGCATCGCCGACGCCCGCCTCTGACGTGTACTCC 
CTGGGCGCGACGTTGTTCTGTGCGCTGACCGGCCATGCCGCCTACGAGCGCCGCAGCGGTGA 
GCGGGTGATCGCCCAGTTCCTGCGGATCACCTCGCAGCCGATCCCCGACCTGCGGAAGCAGG 
GACTGCCCGCGGACGTGGCCGCCGCCATCGAACGGGCGATGGCCCGCCATCCGGCGGATCGT 
CCCGCGACCGCGGCAGACGTTGGCGAGGAGCTTCGCGACGTTCAGCGCCGCAACGGCGTCAG 

10 CGTCGACGAGATGCCCCTCCCCGTCGAGCTGGGCGTGGAACGCCGACGCTCGCCCGAGGCGC 
ACGCGGCGCATCGGCATACCGGCGGCGGCACCCCGACGGTCCCGACGCCTCCGACACCCGCG 
ACCAAGTACCGGCCGTCGGTGCCCACCGGCTCGCTGGTCACCCGCAGCCGGCTCACCGACAT 
CCTGCGCGCCGGCGGACGGCGCCGGCTGATCCTCATCCACGCGCCCTCGGGATTCGGCAAAA 
GCACCCTGGCGGCGCAATGGCGGGAAGAGCTCTCGCGCGACGGCGCCGCGGTCGCCTGGCT 

15 GACAATCGACAACGACGACAACAACGAGGTGTGGTTCTTGTCGCACCTGCTCGAGTCGATCCG 
GCGGGTCCGGCCCACGCTGGCCGAGTCGTTGGGGCACGTGCTCGAAGAGCATGGGGATGACG 
CCGGCCGCTACGTGTTGACTTCGCTGATCGACGAAATCCACGAAAACGACGACCGGATCGCGG 
TGGTGATCGACGACTGGCATCGGGTGTCCGACAGCCGCACCCAAGCTGCCCTGGGTTTCCTGC 
TGGACAACGGATGTCACCACCTGCAGCTCATCGTGACCAGCTGGTCTCGCGCCGGTTTGCCGG 

20 TGGGCAGGTTGCGGATCGGCGACGAACTAGCCGAGATCGATTCGGCTGCTTTGCGCTTCGATA 
CCGACGAGGCCGCCGCGCTGCTGAACGATGCTGGTGGTCTGCGATTGCCGCGCGCAGACGTG 
CAGGCGCTGACTACCTCTACCGACGGGTGGGCCGCGGCGCTGCGGCTGGCCGCGCTGTCGCT 
GCGCGGCGGGGGCGACGCGACCCAACTCCTGCGCGGACTTTCCGGCGCCAGTGACGTGATCC 
ACGAATTCCTGAGCGAAAACGTGCTGGACACCCTGGAACCCGAACTGCGCGAATTCCTACTGGT 

25 GGCATCGGTCACCGAACGCACGTGCGGCGGGCTGGCCTCGGCGCTGGCCGGGATCACCAATG 
GGCGGGCGATGCTGGAAGAGGCCGAGCACCGCGGCTTGTTCCTGCAACGGACCGAAGACGAC 
CCGAATTGGTTTCGCTTCCACCAAATGTTCGCCGACTTTCTCCACCGTCGCCTCGAACGTGGCG 
GGTCGCACCGGGTGGCGGAACTGCACCGCAGGGCATCGGCCTGGTTCGCCGAGAACGGCTAC 
CTGCACGAAGCCGTCGACCATGCACTGGCCGCGGGCGATCCCGCGCGCGCCGTCGATCTTGT 

30 CGAGCAGGATGAAACGAACCTGCCGGAGCAGTCAAAGATGACCACACTTCTGGCAATCGTGCA 
GAAACTGCCGACGTCGATGGTGGTTTCACGGGCCCGGCTCCAACTCGCCATCGCGTGGGCGAA 
CATTCTGCTGCAACGGCCGGCGCCGGCCACCGGTGCCCTGAATCGTTTCGAAACGGCCCTTGG 
CCGGGCCGAGCTTCCCGAGGCGACGCAGGCGGATCTGCGGGCCGAGGCAGACGTGTTGCGG 
GCGGTCGCCGAGGTGTTCGCAGACCGGGTCGAGCGCGTGGATGACCTTCTCGCCGAGGCAAT 

35 GTCGAGACCGGACACCCTGCCCCCGCGAGTCCCCGGGACCGCCGGCAACACCGCGGCGTTGG 
CCGCGATCTGCCGCTTCGAGTTCGCCGAGGTATATCCACTGCTGGACTGGGCCGCGCCCTACC 
AGGAAATGATGGGACCGTTCGGCACCGTTTATGCGCAGTGCTTGCGCGGCATGGCGGCCAGGA 
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ATCGGCTCGACATTGTCGCTGCGCTACAGAACTTCCGAACGGCGTTCGAGGTCGGCACGGCAG 
TGGGGGCCCACTCGCACGCGGCGCGGCTTGCGGGTTCGCTGCTCGCCGAATTGCTCTACGAG 
ACCGGCGATCTGGCCGGGGCTGGTCGTCTCATGGACGAGAGCTATCTGCTGGGTTCCGAGGG 
GGGTGCAGTGGACTACCTGGCCGCCAGGTACGTGATCGGCGCGCGGGTCAAGGCGGCCCAGG 
5 GGGATCATGAGGGTGCGGCTGATCGCCTGTCCACCGGAGGCGATACTGCCGTCCAGCTGGGG 
CTGCCGCGCCTGGCTGCCCGAATCAACAACGAGCGGATCCGGCTGGGCATCGCGCTACCTGC 
GGCGGTGGCCGCCGATTTGCTGGCACCCCGCACCATCCCCCGCGACAATGGAATCGCCACCAT 
GACAGCCGAACTCGACGAGGACTCCGCGGTGCGCCTGTTGTCCGCCGGCGACTCCGCCGATC 
GTGACCAAGCCTGCCAACGGGCCGGTGCTCTCGCCGCCGCCATCGACGGTACGCGCAGACCG 
10 CTGGCGGCGCTGCAGGCGCAAATACTTCATATCGAAACGCTTGCCGCCACCGGACGGGAATCC 
GATGCGCGAAACGAACTGGCGCCGGTAGCCACGAAGTGCGCCGAACTCGGGCTGTCACGTCT 
GCTGGTCGATGCGGGACTGGCCTAA 

>Rv3106 fprA adrenodoxin and NADPH ferredoxin reductase TB.seq 3474004:3475371 

15 MW:49342 >emb|AL123456|MTBH37RV:3474004-3475374, fprA SEQ ID NO:120 

ATGCGTCCCTATTACATCGCCATCGTGGGCTCCGGGCCGTCGGCGTTCTTCGCCGCGGCATCC 
TTGCTGAAGGCCGCCGACACGACCGAGGACCTCGACATGGCCGTCGACATGCTGGAGATGTTG 
CCGACTCCCTGGGGGCTGGTGCGCTCCGGGGTCGCGCCGGATCACCCCAAGATCAAGTCGAT 
CAGCAAGCAATTCGAAAAGACGGCCGAGGACCCCCGCTTCCGCTTCTTCGGCAATGTGGTCGT 

20 CGGCGAACACGTCCAGCCCGGCGAGCTCTCCGAGCGCTACGACGCCGTGATCTACGCCGTCG 
GCGCGCAGTCCGATCGCATGTTGAACATCCCCGGTGAGGACCTGCCGGGCAGTATCGCCGCC 
GTCGATTTCGTCGGCTGGTACAACGCACATCCACACTTCGAGCAGGTATCACCCGATCTGTCGG 
GCGCCCGGGCCGTAGTTATCGGCAATGGAAACGTCGCGCTAGACGTGGCACGGATTCTGCTCA 
CCGATCCCGACGTGTTGGCACGCACCGATATCGCCGATCACGCTTTGGAATCGCTACGCCCAC 

25 GCGGTATCCAGGAGGTGGTGATCGTCGGGCGCCGAGGTCCGCTGCAGGCCGCGTTCACCACG 
TTGGAGTTGCGCGAGCTGGCCGACCTCGACGGGGTTGACGTGGTGATCGATCCGGCGGAGCT 
GGACGGCATTACCGACGAGGACGCGGCCGCGGTGGGCAAGGTCTGCAAGCAGAACATCAAGG 
TGCTGCGTGGCTATGCGGACCGCGAACCCCGCCCGGGACACCGCCGCATGGTGTTCCGGTTCT 
TGACCTCTCCGATCGAGATCAAGGGCAAGCGCAAAGTGGAGCGGATCGTGCTGGGCCGCAACG 
x 30 AGCTGGTCTCCGACGGCAGCGGGCGAGTGGCGGCCAAGGACACCGGCGAGCGCGAGGAGCT 
GCCAGCTCAGCTGGTCGTGCGGTCGGTCGGCTACCGCGGGGTGCCCACGCCCGGGCTGCCGT 
TCGACGACCAGAGCGGGACCATCCCCAACGTCGGCGGCCGAATCAACGGCAGCCCCAACGAAT 
ACGTCGTCGGGTGGATCAAGCGCGGGCCGACCGGGGTGATCGGGACCAACAAGAAGGACGCC 
CAAGACACCGTCGACACCTTGATCAAGAATCTTGGCAACGCCAAGGAGGGCGCCGAGTGCAAG 

35 AGCTTTCCGGAAGATCATGCCGACCAGGTGGCCGACTGGCTAGCAGCACGCCAGCCGAAGCTG 
GTCACGTCGGCCCACTGGCAGGTGATCGACGCTTTCGAGCGGGCCGCCGGCGAGCCGCACGG 
GCGTCCCCGGGTCAAGTTGGCCAGCCTGGCCGAGCTGTTGCGGATTGGGCTCGGCTGA 
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>Rv3235 - TB.seq 3611296:3611934 MW:22659 >emb|AL123456|MTBH37RV:361 1296-361 1937, 
Rv3235 SEQIDNO:121 

ATGATGGCCAGCAACCAAACCGCTGCGCAACACTCGTCTGCCACTCTCCAGCAGGCTCCTCGTT 
5 CGATCGATGATGCTGGAGGGTGCCCCTTGACCATCAGTCCTATCGCGAACTCACCGGGCGACA 
CCTTCGCCGTCACACCCGTCGTCGAGTACGAGCCGCCGCCGCGAAACATCCCGCCGTGCGGG 
CAATCATCGCACGCAGCCCGGCGGCCGCACACCCCGCAGCTAGCTCGCCGACAACCAATCAGG 
CCGAGCGGCCGGGCACCGGCAGCGGTCACCTCCACGGCCAAGTCACCGCGGCTGCGTCAAGC 
GGGGACCTTCGCCGATGCCGCGCTACGCCGAGTGCTGGAGGTCATCGACCGCCGCCGCCCGG 
10 TGGGCCAGCTGCGCCCCCTGCTGGCACCCGGCCTCGTCGACTCCGTGCTCGCGGTGAGCCGC 
ACGGCGGCCGGACACCAACAAGGCGCGGCCATGCTGCGCCGCATCCGGCTGACACCGGCCGG 
ACCCGACACCGCGGACACCGCCGCCGAGGTCTTCGGCACCTACAGTCGCGGGGACCGGATCC 
ATGCGATCGCCTGCCGGGTGGAACAACGGCCCGCCGGTAACGAAACCCGATGGCTGATGGTC 
GCCCTGCACATCGGGTGA 

15 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
>emb|AL123456|MTBH37RV:c3636263-3635037, manA SEQ ID NO:122 

GTGGAACTGCTACGTGGCGCGTTACGCACCTACGCTTGGGGATCGCGCACCGCTATCGCCGAA 
TTCACCGGGCGTCCGGTGCCGGCCGCTCACCCCGAGGCCGAACTATGGTTCGGTGCACACCC 

20 GGGTGATCCGGCTTGGCTGCAGACGCCGCATGGCCAAACCTCGTTGCTCGAAGCGTTGGTCGC 
GGATCCGGAGGGGCAGCTCGGCTCCGCGTCGCGCGCGCGATTCGGCGATGTGTTGCCGTTCT 
TGGTCAAGGTGTTGGCGGCCGACGAGCCACTATCGTTGCAGGCCCATCCGAGCGCCGAGCAG 
GCGGTTGAGGGCTACCTGCGGGAAGAGCGAATGGGCATTCCGGTGTCCTCACCCGTCCGCAAC 
TACCGCGACACCAGTCACAAGCCAGAGTTATTGGTGGCGCTGCAGCCGTTCGAGGCGCTGGCC 

25 GGATTCCGGGAGGCGGCTCGCACCACCGAGCTGCTGCGGGCGCTGGCCGTATCCGACCTCGA 
CCCGTTCATCGACTTGCTGAGCGAGGGGTCCGATGCCGATGGTTTGCGTGCGCTGTTCACCAC 
CTGGATTACCGCACCCCAGCCCGACATCGACGTGCTGGTGCCTGCCGTGCTGGACGGCGCTAT 
CCAGTACGTCAGCTCCGGCGCAACGGAATTTGGCGCCGAAGCCAAGACAGTGCTGGAACTCGG 
CGAACGTTATCCCGGCGACGCCGGTGTGCTGGCGGCGTTGTTGCTCAACCGCATCAGCTTGGC 

30 TCCTGGGGAGGCGATCTTCCTGCCGGCCGGCAACCTGCACGCCTATGTGCGTGGTTTCGGTGT 
GGAAGTGATGGCCAACTCCGACAACGTGTTACGCGGTGGACTTACCCCTAAGCACGTCGATGT 
GCCCGAGTTGTTGCGGGTGCTGGACTTCGCCCCCACGCCGAAGGCTCGGCTGCGGCCCCCGA 
TCCGGCGCGAGGGGCTGGGGCTGGTCTTTGAGACGCCCACCGATGAGTTCGCGGCCACGCTA 
CTGGTGCTCGACGGCGATCACCTCGGCCACGAGGTCGACGCGTCGTCCGGCCATGACGGTCC 

35 ACAGATCTTGTTATGCACCGAGGGTTCGGCGACGGTGCACGGGAAGTGCGGGTCGCTCACGCT 
ACAGCGCGGCACGGCCGCCTGGGTGGCGGCCGACGACGGCCCGATCCGGCTGACCGCCGGC 
CAACCCGCCAAGCTGTTCAGGGCGACCGTCGGGTTGTGA 
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>Rv3264c rmlA2 glucose-1 -phosphate thymidyltransferase TB.seq 3644897:3645973 MW:37840 
>emb|AL123456|MTBH37RV:c3645973-3644894, rmlA2 SEQ ID NO:123 

TTGGCAACTCACCAAGTCGATGCGGTGGTCCTGGTCGGTGGCAAGGGTACCCGACTGCGGCCG 
5 TTGACGCTGTCGGCGCCCAAGCCAATGCTGCCTACCGCCGGACTGCCGTTCCTCACCCATCTG 
CTGTCGCGGATCGCCGCAGCGGGCATCGAGCACGTGATCCTGGGTACGTCCTACAAACCCGCA 
GTCTTCGAAGCGGAGTTCGGCGACGGGTCCGCACTGGGCGTACAGATCGAATACGTGACCGAG 
GAGCATCCCTTGGGGACTGGCGGCGGCATCGCCAACGTTGCCGGCAAGCTGCGCAACGACAC 
CGCGATGGTGTTTAACGGCGATGTGCTCTCGGGCGCGGATCTGGCCCAACTGCTGGACTTCCA 

10 CCGAAGCAATCGAGCCGATGTCACGCTGCAACTGGTGCGGGTGGGCGACCCGCGGGCATTCG 
GCTGCGTACCCACCGACGAGGAGGACCGCGTAGTCGCCTTTCTGGAGAAGACGGAGGATCCG 
CCGACCGACCAGATCAATGCCGGCTGCTATGTCTTCGAACGCAACGTCATCGACCGGATTCCGC 
AGGGCCGGGAGGTTTCGGTGGAACGCGAGGTGTTCCCGGCCTTGCTCGCCGACGGCGACTGC 
AAGATCTACGGCTATGTCGATGCCAGCTATTGGCGGGACATGGGCACACCGGAAGACTTCGTTC 

15 GCGGATCGGCGGATCTGGTGCGCGGCATCGCCCCGTCTCCGGCCTTGCGTGGTCACCGCGGT 
GAGCAGTTGGTGCACGACGGTGCGGCGGTATCTCCCGGTGCGTTGCTGATTGGCGGCACCGTC 
GTGGGGCGTGGTGCCGAAATCGGCCCCGGCACCAGATTGGACGGCGCGGTCATCTTCGATGG 
TGTCCGGGTGGAGGCCGGGTGCGTGATCGAGCGTTCGATCATCGGCTTCGGTGCTCGCATCGG 
ACCGCGGGCGTTGATCCGCGACGGTGTGATCGGTGACGGGGCCGACATCGGCGCGCGCTGCG 

20 AGTTGTTAAGTGGTGCCCGGGTATGGCCCGGTGTCTTTCTTCCCGACGGCGGGATCCGTTACTC 
GTCCGACGTTTGA 

>Rv3368c - TB.seq 3780334:3780975 MW:23734 >emb|AL123456|MTBH37RV:c3780975-3780331, 
Rv3368c SEQIDNO:124 

25 ATGACCCTCAACCTGTCCGTCGACGAGGTCCTGACCACTACCCGCTCGGTGCGCAAGCGTCTC 
GATTTCGACAAGCCGGTGCCACGCGACGTGCTGATGGAATGCCTCGAGCTGGCGCTGCAGGCG 
CCCACCGGTTCCAATTCCCAAGGCTGGCAGTGGGTGTTCGTCGAGGACGCCGCCAAGAAAAAG 
GCGATCGCCGACGTCTACCTGGCCAACGCCCGGGGCTACCTCAGCGGGCCGGCGCCCGAGTA 
CCCCGACGGCGACACCCGCGGCGAGCGGATGGGGCGGGTCCGCGATTCGGCGACCTATCTCG 

30 CCGAACACATGCACCGGGCGCCGGTGCTGCTGATCCCCTGCCTGAAAGGCCGGGAAGACGAG 
TCGGCGGTGGGTGGCGTGTCGTTTTGGGCCTCACTGTTCCCGGCGGTGTGGAGCTTCTGCCTG 
GCGCTGCGCTCCCGCGGGCTGGGTTCGTGCTGGACGACGCTGCACCTGCTCGACAACGGCGA 
GCACAAGGTGGCCGACGTGCTCGGCATTCCCTACGACGAATACAGCCAAGGCGGGCTGCTTCC 
GATCGCCTACACACAAGGCATCGACTTCCGGCCGGCCAAGCGGCTGCCGGCCGAGAGCGTGA 

35 CGCACTGGAACGGCTGGTAA 
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>Rv3382c lytB1 TB.seq 3796447:3797433 MW:34667 >emb|AL123456|MTBH37RV:c3797433- 
3796444, tytB SEQ ID NO:125 

ATGGCTGAGGTGTTCGTGGGACCGGTCGCACAGGGATACGCTTCGGGTGAAGTCACGGTGCTG 
TTGGCGTCGCCGCGGTCGTTTTGCGCCGGTGTAGAGCGTGCTATCGAGACGGTCAAGCGAGTG 
5 CTTGACGTGGCCGAAGGCCCGGTGTATGTGCGCAAGCAAATCGTGCACAACACTGTTGTGGTT 
GCCGAGTTGCGGGACCGGGGAGCAGTGTTCGTCGAGGATCTCGACGAGATTCCCGATCCGCC 
GCCGCCGGGGGCGGTCGTGGTGTTCTCCGCGCATGGGGTTTCCCCGGCGGTGCGCGCGGGC 
GCTGATGAGCGGGGACTGCAGGTCGTCGACGCGACCTGCCCACTGGTGGCGAAAGTCCACGC 
TGAAGCCGCACGGTTTGCCGCGCGCGGTGACACGGTGGTCTTCATCGGGCACGCCGGACATG 

10 AGGAGACCGAAGGCACGCTTGGCGTCGCTCCGCGGTCAACATTATTGGTGCAGACACCCGCTG 
ATGTGGCAGCGTTGAACCTGCCCGAGGGTACCCAGCTATCGTATCTGACCCAGACAACCCTGG 
CACTTGATGAAACTGCCGATGTCATTGATGCGCTGCGCGCGAGGTTTCCGACGTTGGGCCAACC 
CCCCTCTGAAGACATCTGCTATGCCACCACGAACAGACAGCGTGCGCTGCAATCGATGGTCGGT 
GAATGTGACGTTGTGTTGGTGATTGGCTCGTGCAATTCGTCGAATTCGCGGCGTCTGGTCGAGT 

15 TGGCGCAGCGAAGTGGGACGCCGGCCTACTTGATTGACGGGCCTGATGACATTGAGCCCGAAT 
GGCTGTCGTCGGTCTCGACGATCGGTGTCACCGCGGGAGCCTCCGCGCCGCCACGACTGGTG 
GGGCAGGTGATTGATGCACTTCGCGGATACGCCTCGATCACCGTGGTGGAACGCTCGATAGCG 
ACCGAGACGGTGCGATTCGGCCTTCCCAAACAGGTTCGCGCGCAATGA 

20 >Rv341 8c groES 1 0 kD chaperone TB.seq 3836985:3837284 MW: 1 0773 
>emb|AL123456|MTBH37RV:c3837284-3836982, groES SEQ ID NO: 126 

GTGGCGAAGGTGAACATCAAGCCACTCGAGGACAAGATTCTCGTGCAGGCCAACGAGGCCGAG 
ACCACGACCGCGTCCGGTCTGGTCATTCCTGACACCGCCAAGGAGAAGCCGCAGGAGGGCAC 
CGTCGTTGCCGTCGGCCCTGGCCGGTGGGACGAGGACGGCGAGAAGCGGATCCCGCTGGACG 
25 TTGCGGAGGGTGACACCGTCATCTACAGCAAGTACGGCGGCACCGAGATCAAGTACAACGGCG 
AGGAATACCTGATCCTGTCGGCACGCGACGTGCTGGCCGTCGTTTCCAAGTAG 

>Rv3423c air TB.seq 3840193:3841416 MW:43357 
>emb|AL123456|MTBH37RV:c3841416-3840190, air SEQ ID NO:127 

30 GTGAAACGGTTCTGGGAGAATGTCGGAAAGCCAAACGACACGACAGATGGGCGGGGCACGACT 
TCGTTGGCCATGACACCGATATCCCAGACACCTGGCCTCCTCGCCGAGGCCATGGTGGATCTG 
GGCGCTATTGAACACAACGTGCGGGTGCTGCGTGAGCACGCCGGCCACGCGCAGCTGATGGC 
GGTGGTCAAGGCCGACGGCTACGGTCACGGTGCTACGCGCGTCGCCCAAACCGCCCTGGGAG 
CCGGTGCGGCCGAACTCGGCGTCGCCACCGTCGACGAGGCGCTAGCGCTGCGCGCTGATGGC 

35 ATTACCGCACCGGTGCTGGCCTGGCTGCATCCGCCCGGCATCGACTTCGGGCCCGCGCTGCTG 
GCCGACGTGCAGGTCGCGGTGTCCTCGCTGCGCCAACTCGACGAACTGTTGCACGCGGTGCG 
CCGGACCGGCCGGACGGCGACGGTGACCGTCAAGGTGGATACCGGGCTGAACCGCAATGGCG 
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TGGGACCGGCACAATTCCCGGCCATGCTGACCGCGTTACGCCAAGCCATGGCCGAGGACGCC 
GTCCGGCTGCGGGGGCTGATGTCGCATATGGTTTACGCCGACAAGCCTGACGATTCCATCAAC 
GATGTTCAGGCCCAACGGTTTACCGCCTTTCTGGCGCAGGCCCGCGAACAAGGGGTGCGGTTC 
GAGGTGGCGCATCTATCGAACTCATCAGCAACTATGGCGCGCCCCGACCTGACGTTCGACCTG 
5 GTGCGGCCGGGCATCGCGGTGTATGGGCTAAGCCCGGTACCCGCCCTCGGTGACATGGGGCT 
GGTGCCGGCGATGACCGTGAAATGTGCTGTTGCGCTGGTGAAATCGATTCGTGCGGGGGAGGG 
CGTGTCGTATGGGCACACATGGATCGCGCCACGCGACACCAATCTGGCGCTGCTGCCGATCGG 
TTACGCAGACGGCGTGTTCCGGTCGCTGGGCGGGCGGCTGGAGGTGCTGATCAACGGCAGAC 
GATGCCCCGGTGTGGGGCGGATCTGCATGGACCAGTTCATGGTCGACCTGGGCCCCGGGCCG 
10 CTTGATGTGGCCGAAGGCGACGAGGCGATTTTGTTCGGGCCGGGCATCCGGGGTGAGCCCAC 
GGCTCAGGACTGGGCCGATCTTGTCGGCACCATCCACTACGAAGTGGTCACCAGCCCGCGAGG 
ACGTATCACCAGGACCTATCGCGAGGCTGAAAACCGTTGA 

>Rv3490 otsA [alpha],-trehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 

15 >emb|AL123456|MTBH37RV:3908232-3909734, otsA SEQ ID NO:128 

ATGGCTCCCTCGGGAGGCCAGGAGGCGCAGATTTGCGATTCGGAGACCTTCGGGGACTCTGAC 
TTCGTGGTGGTAGCCAATCGACTGCCCGTCGATCTGGAGCGTCTTCCCGACGGCAGCACAACC 
TGGAAACGCAGCCCCGGAGGCTTGGTCACCGCCTTGGAGCCGGTGCTGCGGCGTCGGCGCGG 
GGCCTGGGTCGGCTGGCCCGGCGTTAACGACGACGGGGCCGAACCCGACCTCCACGTGCTGG 

20 ACGGCCCCATCATCCAAGACGAGCTGGAACTTCATCCGGTACGGCTGAGCACCACGGACATAG 
CTCAGTACTACGAGGGATTCTCCAACGCCACACTGTGGCCGCTGTACCACGACGTCATCGTCAA 
GCCGCTCTACCACCGCGAATGGTGGGATCGCTACGTCGACGTCAACCAGCGCTTTGCCGAGGC 
CGCGTCGCGCGCCGCCGCCCACGGCGCAACCGTGTGGGTACAGGACTACCAGCTGCAGCTGG 
TACCGAAGATGCTGCGCATGCTGCGGCCCGATCTGACCATCGGTTTCTTTTTGCACATCCCGTT 

25 CCCGCCGGTAGAGCTGTTTATGCAGATGCCGTGGCGCACCGAGATCATCCAGGGCCTACTGGG 
CGCCGACCTGGTGGGCTTCCATCTTCCGGGCGGTGCCCAGAATTTCCTGATCCTGTCCCGGCG 
TCTGGTCGGCACCGACACTTCCCGCGGAACCGTCGGTGTGCGGTCGCGGTTCGGTGCGGCGG 
TGCTCGGGTCCCGCACCATACGAGTTGGCGCCTTTCCTATCTCGGTTGACTCCGGCGCGCTCG 
ACCACGCTGCCCGCGACCGCAACATCAGGCGCCGGGCCCGCGAGATTCGCACCGAACTGGGA 

30 AATCCGCGCAAGATCCTGCTCGGTGTTGACCGGCTCGACTACACCAAGGGCATCGACGTACGG 
CTGAAGGCCTTTTCCGAGCTGCTGGCCGAGGGCCGCGTCAAACGCGACGACACCGTCGTGGTC 
CAGCTGGCTACCCCGAGCCGCGAGCGGGTGGAGAGCTACCAGACGCTGCGCAACGACATCGA 
ACGCCAGGTCGGCCACATTAACGGCGAGTACGGTGAGGTTGGCCATCCGGTAGTGCATTACCT 
GCATCGACCGGCTCCGCGCGACGAGCTTATCGCTTTCTTCGTGGCCAGCGACGTCATGCTGGT 

35 CACCCCACTACGCGACGGGATGAACCTGGTGGCCAAGGAGTACGTCGCTTGCCGCAGCGATCT 
TGGCGGTGCCCTGGTGCTCAGCGAATTCACCGGGGCCGCAGCCGAACTCCGGCACGCATACCT 
GGTCAACCCGCACGACCTGGAAGGCGTCAAGGACGGGATAGAGGAAGCGCTCAACCAGACGG 
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AGGAGGCGGGCCGGCGGCGAATGCGGTCGCTGCGACGCCAAGTGCTCGCCCACGACGTGGA 
CCGCTGGGCACAGTCGTTTCTCGACGCTCTCGCCGGGGCACACCCGAGGGGCCAAGGCTAA 

>Rv3598c lysS lysyl-tRNA synthase TB.seq 4041 423:4042937 MW:55678 
5 >emb|AL123456|MTBH37RV:c4042937-4041420, lysS SEQ ID NO: 129 

GTGAGTGCCGCTGACACAGCAGAAGACCTTCCTGAGCAGTTCCGGATTCGCCGGGACAAGCGC 
GCTCGCTTGCTGGCCCAGGGGCGCGATCCCTATCCCGTCGCGGTGCCGCGCACTCACACGTTG 
GCCGAGGTTCGCGCCGCCCACCCTGACTTGCCGATCGATACCGCGACCGAAGACATCGTCGGC 
GTCGCGGGCCGAGTGATCTTTGCGCGCAACTCGGGAAAGCTATGCTTTGCGACACTTCAGGAC 

10 GGCGATGGTACCCAGCTGCAAGTGATGATCAGCCTCGACAAGGTCGGCCAGGCTGCTCTCGAC 
GCATGGAAAGCCGATGTCGACCTGGGCGACATCGTCTACGTGCATGGCGCGGTGATCAGTTCG 
CGCCGCGGCGAGCTGTCCGTCCTGGCGGATTGCTGGCGGATCGCCGCCAAGTCGCTGCGGCC 
GCTTCCCGTCGCGCACAAAGAGATGAGTGAAGAGTCGCGGGTTCGTCAGCGCTATGTTGACCT 
CATAGTTCGACCGGAAGCGCGCGCGGTGGCTCGACTACGGATCGCCGTCGTCCGCGCGATCC 

15 GGACGGCGCTTCAACGTCGTGGGTTCCTGGAAGTCGAGACGCCCGTCTTGCAGACGTTAGCCG 
GTGGTGCGGCGGCCCGTCCGTTCGCCACTCATTCCAATGCCCTAGACATCGATCTGTACCTGCG 
GATCGCGCCGGAACTGTTCCTCAAGCGCTGCATCGTGGGTGGTTTCGACAAGGTCTTCGAACTT 
AATCGAGTGTTCCGAAACGAAGGAGCCGATTCCACGCATTCTCCGGAATTCTCCATGCTGGAGA 
CCTACCAGACCTACGGAACCTATGACGATTCGGCAGTCGTCACCCGGGAGCTTATTCAAGAGGT 

20 GGCCGATGAGGCGATCGGAACCAGACAACTGCCGTTGCCCGACGGCAGTGTCTATGACATCGA 
CGGAGAATGGGCGACTATACAAATGTACCCGTCGCTGTCTGTGGCGCTCGGTGAAGAGATCAC 
ACCGCAGACGACGGTCGATCGCTTACGTGGGATCGCCGATAGCCTTGGCCTGGAGAAAGACCC 
AGCGATTCATGACAACCGTGGCTTCGGCCACGGCAAACTCATCGAGGAACTCTGGGAGCGCAC 
AGTGGGCAAGAGCTTGAGCGCACCCACATTTGTCAAGGATTTTCCGGTTCAGACAACGCCTTTG 

25 ACCCGTCAGCACCGCAGTATCCCCGGCGTAACCGAGAAGTGGGACCTCTATCTGCGCGGAATC 
GAACTTGCCACCGGCTACTCGGAATTAAGCGACCCGGTAGTCCAGCGGGAGAGATTCGCCGAC 
CAGGCCCGTGCCGCGGCCGCTGGCGATGACGAAGCGATGGTGCTTGACGAGGATTTTCTGGCC 
GCTCTGGAGTACGGCATGCCACCGTGCACCGGAACCGGAATGGGTATCGATCGGTTGTTGATG 
TCTTTGACTGGGTTGTCAATTAGGGAGACAGTTTTGTTCCCGATTGTTCGACCACACTCCAACTG 

30 A 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
>emb|AL123456|MTBH37RV:c4043856-4043038, Rv3600c SEQ ID NO:130 
GTGCTGCTGGCGATTGACGTCCGCAACACCCACACCGTTGTGGGCCTGCTGTCCGGAATGAAA 
35 GAGCACGCAAAGGTCGTGCAGCAGTGGCGGATACGCACCGAATCCGAAGTCACCGCCGACGAA 
CTGGCACTGACGATCGACGGGCTGATCGGCGAGGATTCCGAGCGGCTCACCGGTACCGCCGC 
CTTGTCCACGGTCCCGTCCGTGCTGCACGAGGTGCGGATAATGCTCGACCAGTACTGGCCGTC 
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GGTGCCGCACGTGCTGATCGAGCCCGGAGTACGCACCGGGATCCCTTTGCTCGTCGACAACCC 
GAAGGAAGTGGGCGCAGACCGGATCGTGAACTGTTTGGCCGCCTATGACCGGTTCCGGAAGGC 
CGCCATCGTCGTTGACTTTGGATCCTCX3ATCTGTGTTGATGTTGTATCGGCCMGGGTGAATTTC 
TTGGCGGCGCCATCGCGCCCGGGGTGCAGGTGTCTTCCGATGCCGCGGCGGCCCGCTCGGCG 
5 GCATTGCGCCGCGTTGAACTTGCCCGCCCACGTTCGGTGGTTGGCAAGAACACCGTCGAATGC 
ATGCAAGCCGGTGCGGTGTTCGGCTTCGCCGGGCTGGTAGACGGGTTGGTAGGCCGCATCCG 
CGAGGACGTGTCCGGTTTCTCCGTCGACCACGATGTCGCGATCGTGGCTACCGGGCATACCGC 
GCCCCTGCTGCTGCCGGAATTGCACACCGTCGACCATTACGACCAGCACCTGACCTTGCAGGG 
TCTGCGGCTGGTGTTCGAGCGTAACCTCGAAGTCCAGCGCGGCCGGCTCAAGACGGCGCGCT 
10 GA 

>Rv3606c folK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW:20732 >emb|AL123456|MTBH37RV:c4048744-4048178, folK 
SEQIDNO:131 

15 ATGACGCGGGTAGTGCTCTCGGTTGGCTCCAACCTGGGTGACCGCCTGGCACGATTGCGGTCG 
GTCGCCGACGGTCTCGGCGATGCGTTGATTGCGGCTTCCCCGATATATGAGGCCGACCCCTGG 
GGTGGGGTGGAGCAGGGGCAGTTCCTCAATGCGGTGCTGATCGCCGACGATCCTACCTGCGAA 
CCGCGGGAGTGGCTGCGGCGGGCGCAGGAGTTCGAGCGCGCTGCGGGCAGGGTGCGTGGCC 
AGCGCTGGGGTCCACGAAATCTCGACGTCGACCTGATCGCCTGCTACCAGACCTCGGCCACCG 

20 AGGCTCTGGTCGAAGTGACCGCGCGGGAGAACCACCTCACGCTGCCGCACCCACTGGCGCAT 
CTGCGGGCCTTTGTGTTGATCCCGTGGATTGCCGTCGACCCAACGGCGCAGCTGACGGTTGCC 
GGGTGCCCGCGGCCCGTCACGCGACTGCTGGCCGAGCTGGAGCCCGCCGACCGCGACAGTGT 
GCGGTTGTTTAGGCCGTCGTTCGATCTGAATAGCAGACACCCCGTCAGTCGGGCACCGGAAAG 
CTGA 

25 

>Rv3607c foIX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14553 
>emb|AL123456|MTBH37RV:c4049142^048741, foIX SEQ ID NO:132 

ATGGCTGACCGAATCGAACTGCGCGGCCTGACCGTGCATGGTCGGCACGGGGTCTACGACCAC 
GAGCGAGTGGCCGGGCAGCGGTTTGTCATCGATGTCACCGTGTGGATAGACCTGGCCGAGGC 
30 CGCCAACAGCGACGACTTGGCCGACACCTATGACTACGTGCGGCTGGCTTCGCGGGCGGCCG 
AGATCGTCGCCGGACCCCCGCGGAAGCTGATCGAAACGGTCGGGGCCGAGATCGCTGATCAC 
GTGATGGACGACCAGCGAGTGCATGCCGTTGAGGTGGCGGTACACAAGCCGCAGGCGCCCATT 
CCGCAGACGTTCGACGATGTGGCGGTGGTGATCCGACGCTCACGGCGCGGCGGCCGCGGTTG 
GGTAGTCCCGGCGGGCGGCGCGGTATGA 

35 

>Rv3608c folP dihydropteroate synthase TB.seq 4049138:4049977 MW:28812 
>emb|AL123456|MTBH37RV:c4049977-4049135, folP SEQ ID NO: 133 



WO 01/35317 



PCT/US00/31152 



GTGAGTCCGGCGCCCGTGCAGGTGATGGGGGTTCTAAACGTCACGGACGACTCTTTCTCGGAC 
GGCGGGTGTTATCTCGATCTCGACGATGCGGTGAAGCACGGTCTGGCGATGGCAGCCGCAGGT 
GCGGGCATCGTCGACGTCGGTGGTGAGTCGAGCCGGCCCGGTGCCACTCGGGTTGACCCGGC 
GGTGGAGACGTCTCGTGTCATACCCGTCGTCAAAGAGCTTGCAGCACAAGGCATCACCGTCAG 
5 CATCGATACCATGCGCGCGGATGTCGCTCGGGCGGCGTTGCAGAACGGTGCCCAGATGGTCAA 
CGACGTGTCGGGTGGGCGGGCCGATCCGGCGATGGGGCCGCTGTTGGCCGAGGCCGATGTG 
CCGTGGGTGTTGATGCACTGGCGGGCGGTATCGGCCGATACCCCGCATGTGCCTGTGCGCTAC 
GGCAACGTGGTGGCCGAGGTCCGTGCCGACCTGCTGGCCAGCGTCGCCGACGCGGTGGCCGC 
AGGCGTCGACCCGGCAAGGCTGGTGCTCGATCCCGGGCTTGGATTCGCCAAGACGGCGCAAC 
10 ATAATTGGGCGATCTTGCATGCCCTTCCGGAACTGGTCGCGACCGGAATCCCAGTGCTGGTGG 
GTGCTTCGCGCAAGCGCTTCCTCGGTGCGTTGTTGGCCGGGCCCGACGGCGTGATGCGGCCA 
ACCGATGGGCGTGACACCGCGACGGCGGTGATTTCCGCGCTGGCCGCACTGCACGGGGCCTG 
GGGTGTGCGGGTGCATGATGTGCGGGCCTCGGTCGATGCCATCAAGGTGGTCGAAGCGTGGAT 
GGGAGCGGAAAGGATAGAACGCGATGGCTGA 

15 

>Rv3609c folE GTP cyclohydrolase I TB.seq 4049977:4050582 MW:22395 
>emb|AL123456|MTBH37RV:c4050582-4049974, folE SEQ ID NO:134 

ATGTCGCAGCTGGATTCGCGCAGCGCATCTGCTCGTATCCGTGTGTTCGACCAGCAACGTGCC 
GAGGCCGCGGTGCGCGAATTGCTGTACGCGATCGGCGAGGATCCGGATAGGGACGGCTTGGT 

20 AGCCACCCCGTCCCGGGTTGCCCGGTCATACCGCGAAATGTTCGCCGGGCTCTACACCGACCC 
CGACTCGGTGTTGAACACCATGTTCGACGAAGACCACGACGAGCTGGTGTTGGTCAAGGAAATC 
CCTATGTACTCCACCTGCGAACACCACCTGGTGGCGTTCCACGGTGTGGCCCACGTCGGCTAC 
ATCCCGGGCGACGACGGCAGGGTGACCGGCTTGTCAAAGATCGCGCGACTGGTCGATCTGTAC 
GCCAAGCGACCTCAGGTCCAGGAGCGGCTCACCAGTCAGATCGCCGATGCCCTGATGAAAAAA 

25 CTCGATCCACGCGGGGTAATCGTGGTGATCGAGGCTGAGCATCTGTGCATGGCGATGCGCGGG 
GTTCGCAAGCCCGGCTCGGTCACCACTACGTCGGCGGTGCGCGGACTGTTCAAAACCAATGCC 
GCTTCTCGAGCCGAAGCGCTCGACCTCATTTTGCGGAAGTGA 

>Rv3610c ftsH inner membrane protein, chaperone TB.seq 4050601:4052880 MW:81987 

30 >emblAL123456|MTBH37RV:c4052880^050598, ftsH SEQ ID NO:135 

ATGAACCGGAAAAACGTGACTCGCACCATAACAGCGATCGCCGTCGTGGTGCTGCTCGGCTGG 
TCGTTCTTTTACTTCAGCGACGACACCCGCGGCTACAAGCCCGTTGATACCTCGGTGGCGATAA 
CACAGATCAACGGCGACAACGTCAAGAGCGCACAGATCGACGATCGCGAGCAACAGCTGCGGC 
TGATCCTGAAGAAGGGTAACAACGAGACCGACGGGTCCGAGAAGGTCATCACCAAGTACCCCA 

35 CCGGGTACGCCGTCGACCTGTTCAACGCGCTCAGCGCCAAAAACGCGAAGGTCAGCACGGTCG 
TCAACCAGGGCAGCATCCTGGGCGAGCTGCTGGTCTACGTGCTGCCGCTGCTGTTGCTGGTGG 
GGCTGTTCGTGATGTTCTCCCGCATGCAAGGCGGCGCCCGGATGGGCTTCGGGTTCGGCAAGT 
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CACGCGCCAAGCAACTGAGCAAGGACAtGCCCAAGACCACCTTCGCCGACGTCGCAGGTGTCG 
ACGAGGCGGTCGAGGAGCTCTACGAGATCAAGGACTTCCTGCAGAACCCCAGCAGGTACCAAG 
CGCTGGGCGCCAAGATCCCCAAAGGCGTGCTGCTCTACGGGCCGCCGGGAACCGGTAAGACG 
TTGCTGGCTCGTGCGGTGGCCGGCGAAGCCGGAGTGCCGTTCTTCACCATCTCCGGCTCCGAC 
5 TTCGTCGAAATGTTCGTCGGCGTCGGCGCATCCCGTGTCAGAGACCTGTTCGAGCAGGCCAAG 
CAGAACAGCCCGTGCATCATCTTCGTCGACGAGATCGACGCCGTCGGCCGACAAAGAGGCGCC 
GGGCTGGGCGGCGGTCACGACGAGCGTGAGCAGACCCTCAACCAGTTGCTAGTCGAAATGGA 
CGGTTTTGGCGATCGCGCCGGCGTCATCCTGATCGCGGCCACCAACCGGCCCGACATCCTGGA 
CCCGGCGCTGTTGCGGCCGGGCCGCTTCGACCGCCAGATCCCGGTATCCAACCCCGATCTGG 

10 CGGGTCGGCGGGCGGTGCTGCGCGTGCACTCCAAGGGCAAGCCGATGGCCGCGGACGCCGA 
CCTCGACGGACTGGCCAAGCGGACCGTCGGCATGACCGGAGCCGACCTGGCCAACGTCATCA 
ACGAGGCGGCGCTGCTGACCGCCCGGGAGAACGGCACCGTCATCACCGGTCCCGCCCTCGAG 
GAAGCGGTGGACCGGGTGATCGGCGGCCCGCGCCGCAAAGGCCGGATCATCAGCGAGCAGGA 
GAAGAAGATCACCGCCTATCACGAGGGCGGGCACACCCTGGCCGCTTGGGCGATGCCCGATAT 

1 5 CGAGCCGATTTATAAGGTGACGATCCTGGCGCGCGGGCGTACCGGCGGGCACGCGGTGGCGG 
TGCCGGAAGAAGACAAGGGCCTGCGGACCCGCTCGGAAATGATCGCGCAACTGGTGTTCGCGA 
TGGGTGGGCGCGCCGCCGAAGAACTGGTGTTTCGTGAGCCGACCACCGGCGCGGTGTCCGAC 
ATCGAGCAGGCCACCAAGATAGCGCGCTCAATGGTCACCGAATTTGGAATGAGCTCCAAGCTG 
GGCGCGGTCAAATACGGCTCCGAACACGGCGACCCGTTCCTCGGACGTACCATGGGCACCCAG 

20 CCGGACTACTCCCACGAGGTCGCCCGCGAGATCGACGAAGAGGTCCGCAAGCTTATCGAGGCG 
GCGCATACCGAAGCGTGGGAAATCCTGACCGAATACCGCGACGTGCTGGACACTTTGGCCGGC 
GAGCTGCTGGAAAAGGAGACCCTGCACCGACCCGAGCTGGAAAGCATCTTCGCTGACGTCGAA 
AAGCGGCCGCGGCTCACCATGTTCGACGACTTCGGTGGCCGGATCCCGTCGGACAAACCGCCC 
ATCAAGACACCCGGCGAGCTCGCGATCGAACGCGGCGAACCTTGGCCCCAGCCGGTCCCCGA 

25 GCCGGCGTTCAAGGCGGCGATTGCGCAGGCTACCCAAGCCGCTGAGGCCGCCCGGTCCGACG 
CCGGCCAAACCGGGCACGGCGCCAACGGTTCGCCCGCCGGCACCCACCGGTCCGGTGACCGC 
CAGTACGGCTCCACCCAGCCTGACTACGGTGCCCCGGCGGGCTGGCATGCGCCGGGATGGCC 
CCCAAGGTCATCTCATCGGCCCAGCTATAGCGGTGAACCGGCACCGACGTATCCGGGTCAGCC 
CTACCCGACCGGTCAAGCCGATCCGGGTTCCGATGAGTCCTCGGCGGAGCAGGATGACGAGGT 

30 CAGTCGGACCAAGCCGGCCCACGGCTGA 



>Rv3671c -TB.seq 4112322:4113512 MW:40722 >emb|AL123456|MTBH37RV:c4113512-4112319, 
Rv3671c SEQIDNO:136 

ATGACCCCGTCGCAGTGGCTGGATATCGCCGTCTTGGCGGTCGCATTTATTGCAGCCATCTCCG 
35 GCTGGCGTGCCGGTGCGCTGGGCTCAATGCTGTCGTTTGGCGGGGTGCTGCTGGGCGCGACA 
GCCGGCGTGCTGCTGGCGCCGCATATCGTCAGTCAAATCAGCGCTCCGCGGGCCAAACTGTTT 
GCCGCGCTGTTCCTGATCCTGGCACTGGTCGTAGTCGGCGAGGTCGCTGGTGTGGTGCTGGGC 
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CGCGCCGTCCGCGGGGCGATCCGTAACCGGCCGATCCGGTTGATCGACTCGGTCATTGGGGTA 
GGGGTGCAGCTGGTCGTGGTGCTCACCGCGGCGTGGTTGTTGGCGATGCCGCTGACACAGTC 
GAAAGAGCAGCCCGAGCTGGCTGCCGCGGTGAAGGGTTCGCGGGTGCTCGCCCGGGTCAACG 
AGGCGGCACCCACCTGGCTGAAGACGGTGCCCAAGCGGCTGTCGGCCCTGCTGAACACCTCC 
5 GGCCTGCCCGCGGTTTTGGAGCCGTTCAGCCGCACGCCGGTCATTCCAGTGGCCTCACCCGAC 
CCAGCGCTGGTCAACAATCCGGTGGTGGCGGCCACCGAGCCAAGTGTCGTCAAAATCCGCAGC 
CTGGCACCCAGATGCCAGAAAGTGTTGGAGGGCACCGGCTTCGTGATCTCACCCGATCGGGTG 
ATGACCAACGCGCACGTGGTGGCCGGATCCAACAACGTCACGGTGTATGCCGGCGACAAGCCC 
TTCGAGGCCACGGTGGTGTCCTACGACCCGTCGGTCGACGTAGCGATCCTGGCCGTTCCGCAC 

10 TTGCCGCCGCCGCCGCTGGTCTTCGCTGCGGAGCCGGCGAAAACCGGTGCCGACGTTGTGGT 
GCTGGGTTATCCCGGCGGCGGCAATTTCACTGCCACACCCGCCAGGATTCGCGAGGCCATCAG 
ACTCAGTGGCCCCGATATTTACGGGGACCCGGAGCCGGTTACCCGCGACGTGTACACCATCAG 
AGCCGATGTGGAGCAAGGTGATTCGGGTGGGCCCCTGATCGACCTCAACGGTCAGGTGCTCGG 
TGTGGTGTTCGGCGCAGCCATCGACGACGCCGAAACTGGGTTTGTGCTGACGGCCGGCGAGGT 

15 GGCGGGGCAGCTTGCCAAAATCGGTGCTACCCAACCGGTCGGCACCGGGGCCTGCGTCAGCT 
GA 

>Rv3682 ponA2 TB.seq 4121913:4124342 MW:84637 
>emb|AL123456|MTBH37RV:4121913^124345, ponA SEQ ID NO:137 

20 ATGCCCGAGCGCCTCCCGGCCGCGATCACCGTTCTGAAGCTGGCTGGGTGCTGTCTGTTGGCC 
AGTGTCGTCGCCACTGCGCTGACGTTCCCGTTCGCAGGCGGGCTAGGGCTGATGTCCAATCGT 
GCCTCTGAGGTCGTTGCCAACGGCTCGGCCCAGCTGCTCGAGGGGCAAGTGCCTGCGGTATCG 
ACGATGGTCGACGCGAAGGGCAACACGATCGCGTGGCTGTACTCGCAGCGCCGGTTCGAGGT 
GCCCTCGGACAAGATCGCCAACACGATGAAGCTGGCGATCGTCTCGATTGAAGATAAGCGGTTC 

25 GCCGACCACAGCGGCGTGGACTGGAAGGGCACCCTGACCGGCCTGGCGGGCTACGCGTCCG 
GCGACCTCGACACGCGCGGCGGCTCGACGCTCGAACAACAGTACGTGAAGAACTACCAACTGC 
TGGTGACAGCCCAAACCGATGCCGAGAAGCGAGCGGCCGTCGAAACCACTCCGGCCCGCAAG 
CTTCGCGAGATCCGGATGGCACTCACGCTGGACAAGACCTTCACAAAATCTGAAATCCTGACCC 
GATACTTGAACCTGGTCTCGTTCGGCAATAACTCGTTCGGCGTGCAGGACGCGGCGCAAACGTA 

30 CTTCGGCATCAACGCGTCCGACCTGAATTGGCAGCAAGCGGCGCTGCTGGCCGGCATGGTGCA 
ATCGACCAGCACGCTCAACCCGTACACCAACCCCGACGGCGCGCTGGCCCGGCGGAACGTGG 
TCCTCGACACCATGATCGAGAACCTTCCCGGGGAGGCGGAGGCGTTGCGTGCCGCCAAGGCC 
GAGCCGCTGGGGGTACTGCCGCAGCCCAATGAGTTGCCGCGCGGCTGCATCGCGGCCGGCGA 
CCGCGCATTCTTCTGCGACTACGTCCAGGAGTACCTGTCTCGGGCCGGGATCAGCAAGGAGCA 

35 GGTCGCCACGGGCGGGTACCTGATCCGCACCACCCTGGACCCAGAGGTGCAGGCACCGGTCA 
AGGCCGCCATCGACAAGTACGCCAGCCCGAACCTGGCCGGTATTTCCAGCGTGATGAGCGTGA 
TCAAACCGGGTAAGGATGCGCACAAGGTGTTGGCCATGGCCAGTAACCGCAAATACGGGCTGG 
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ATCTAGAAGCCGGCGAAACCATGCGGCCGCAGCCATTCTCCCTGGTTGGCGACGGCGCCGGGT 
CTATCTTCAAGATCTTCACCACGGCCGCTGCTCTGGACATGGGCATGGGTATTAACGCCCAACT 
CGACGTGCCGCCCCGATTCCAGGCCAAAGGTCTGGGAAGTGGCGGGGCAAAGGGGTGCCCCA 
AAGAGACCTGGTGTGTGGTGAACGCCGGCAACTACCGCGGCTCGATGAATGTCACCGACGCGC 
5 TGGCAACCTCGCCAAACACCGCGTTCGCCAAGCTGATCTCGCAGGTCGGGGTGGGGCGTGCG 
GTCGATATGGCCATCAAACTCGGGCTGAGGTCTTATGCGAATCCCGGCACCGCACGCGACTAC 
AACCCCGACAGCAATGAGAGCTTGGCTGACTTCGTCAAACGACAGAACCTGGGTTCGTTCACCC 
TCGGCCCCATCGAGTTAAACGCGCTGGAGCTGTCCAACGTGGCGGCCACGTTGGCATCCGGCG 
GCGTGTGGTGCCCCCCCAACCCAATCGACCAGCTCATCGACCGCAACGGCAACGAAGTCGCGG 

10 TCACCACCGAGACGTGCGACCAGGTGGTGCCCGCAGGGCTGGCGAACACCCTCGCCAACGCG 
ATGAGCAAGGACGCCGTGGGCAGCGGCACGGCGGCCGGTTCGGCCGGCGCGGCGGGCTGGG 
ATCTGCCGATGTCCGGCAAAACCGGCACCACCGAGGCGCACCGGTCGGCCGGCTTCGTGGGC 
TTCACCAACCGCTACGCGGCGGCGAACTACATCTACGACGACTCCAGCTCGCCGACAGATCTGT 
GTTCCGGCCCGCTGCGCCATTGCGGCAGCGGCGACTTGTACGGCGGCAACGAGCCATCCCGC 

15 ACCTGGTTCGCCGCGATGAAGCCGATCGCCAACAACTTCGGCGAAGTGCAGCTACCACCGACC 
GATCCACGCTATGTCGACGGCGCACCAGGCTCACGGGTACCAAGCGTGGCCGGTCTGGATGTC 
GACGCCGCACGCCAGCGCCTCAAGGACGCGGGCTTCCAGGTCGCCGACCAAACCAACTCGGT 
CAACAGCTCCGCCAAGTATGGTGAGGTGGTCGGAACGTCGCCCAGCGGTCAAACAATTCCGGG 
TTCGATCGTCACGATCCAGATCAGCAACGGCATCCCGCCGGCTCCGCCTCCGCCACCGCTGCC 

20 TGAGGATGGTGGGCCGCCACCGCCGGTCGGATCGCAGGTGGTGGAGATTCCGGGGCTGCCGC 
CGATCACCATTCCGCTGCTGGCGCCACCACCCCCAGCGCCTCCCCCGTAG 

>Rv3721c dnaZX DNA polymerase Ill.Igamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 
MW:61892 >emb|AL123456|MTBH37RV:c41 66728-41 64992, dnaZX SEQ ID NO:138 

25 GTGGCTCTCTACCGCAAGTACCGACCGGCAAGCTTCGCGGAGGTGGTGGGGCAGGAGCACGT 
CACCGCGCCGCTGTCGGTGGCGCTGGATGCCGGCCGGATCAACCACGCGTACCTGTTCTCTGG 
GCCGCGTGGCTGCGGAAAGACGTCGTCAGCGCGTATCCTGGCGCGGTCGTTGAACTGTGCGCA 
GGGCCCTACCGCCAACCCGTGCGGGGTCTGCGAATCCTGCGTTTCGTTGGCGCCCAACGCCCC 
CGGCAGCATCGACGTGGTAGAGCTGGATGCCGCCAGCCACGGCGGCGTGGACGACACCCGCG 

30 AGCTGCGGGACCGCGCGTTCTATGCGCCGGTCCAGTCACGGTACCGGGTATTTATCGTCGACG 
AGGCGCACATGGTGACCACCGCGGGATTCAACGCGCTGCTCAAGATCGTGGAGGAACCGCCC 
GAACACCTGATCTTCATATTCGCCACCACCGAACCGGAGAAGGTACTGCCGACGATTCGGTCGC 
GCACTCATCACTACCCGTTCCGGCTGCTGCCGCCGCGCACTATGCGGGCGTTGCTCGCGCGGA 
TCTGCGAGCAGGAGGGCGTCGTCGTCGACGATGCGGTGTACCCGTTGGTGATCCGGGCCGGC 

35 GGAGGTTCCCCACGGGATACGCTCTCGGTGCTGGACCAATTGCTGGCTGGGGCCGCGGACAC 
CCACGTGACCTACACCCGGGCGCTGGGGCTGCTGGGTGTCACCGACGTCGCCCTGATCGACG 
ACGCGGTCGACGCACTGGCCGCTTGCGATGCGGCCGCATTGTTCGGGGCGATCGAATCGGTGA 
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TCGATGGCGGACATGACCCTCGGCGTTTCGCTACCGATCTGCTGGAGCGATTCCGCGACCTGA 
TTGTGCTGCAATCGGTTCCCGACGCGGCATCTCGCGGGGTGGTGGATGCGCCCGAAGACGCG 
CTGGATCGGATGCGCGAGCAAGCCGCCCGGATCGGGCGGGCGACCCTGACCCGATATGCCGA 
GGTGGTGCAGGCCGGGCTAGGCGAGATGCGCGGTGCGACCGCGCCGCGTCTGCTGCTGGAA 
5 GTGGTTTGCGCGCGACTGCTGCTGCCCTCGGCGAGCGACGCCGAATCGGCACTGTTGCAGCG 
GGTCGAACGGATCGAGACCCGGTTGGACATGTCGATCCCGGCGCCGCAAGCCGTACCACGCC 
CGTCGGCTGCGGCTGCCGAGCCGAAACACCAGCCCGCGCGTGAACCGAGACCGGTGCTGGCC 
CCCACACCGGCCTCGAGCGAACCCACCGTGGCCGCGGTTCGGTCCATGTGGCCGACGGTGCG 
CGACAAGGTGCGCCTGCGCAGCCGTACCACCGAGGTGATGCTGGCGGGTGCCACCGTCCGTG 

10 CGCTAGAGGACAACACGCTGGTGCTGACCCACGAATCGGCGCCGCTGGCGCGGCGGCTGTCC 
GAACAGCGCAACGCCGATGTCCTCGCCGAGGCGCTTAAAGACGCGCTGGGAGTCAACTGGCG 
GGTGCGGTGTGAGACCGGTGAACCGGCTGCGGCGGCATCACCCGTCGGCGGGGGAGCGAAC 
GTGGCGACCGCCAAGGCCGTAAACCCTGCCCCCACAGCGAATTCCACTCAGCGCGACGAAGAG 
GAGCACATGCTCGCCGAAGCCGGCCGTGGCGACCCGTCGCCGCGTCGCGACCCGGAAGAGGT 

15 TGCACTCGAGCTGCTGCAGAACGAGCTGGGCGCGCGCCGGATAGACAACGCCTAG 

>Rv3783 - TB.seq 4229255:4230094 MW:32337 

>emb|AL123456|MTBH37RV:4229255-4230097, Rv3783 SEQ ID NO:139 

ATGACATTCATGGATGCTCAAGCTAGCTTCCAGACACAGTCGCGGACACTGGCCCGCGTCCGA 
20 GGCGATCTGGTCGACGGGTTCCGCCGCCACGAGCTGTGGCTGCACCTGGGCTGGCAGGACAT 
CAAGCAGCGGTACCGCCGCTCGGTGCTGGGGCCGTTCTGGATCACCATCGCCACCGGAACGA 
CCGCCGTCGCGATGGGCGGCCTGTATTCCAAGCTGTTTCGGCTCGAGCTGTCTGAGCACCTGC 
CCTACGTCACGCTCGGGCTGATCGTCTGGAACCTGATCAACGCCGCCATCCTGGACGGCGCAG 
AGGTTTTCGTCGCCAACGAAGGTCTGATCAAACAGCTGCCGGCACCGTTGAGCGTGCACGTCTA 
25 TCGGTTGGTGTGGCGGCAGATGATCTTCTTCGCCCACAACATCGTCATCTACTTCGTCATCGCG 
ATCATCTTTCCTAAGCCGTGGTCGTGGGCGGATCTGTCGTTTCTTCCGGCGCTGGCGCTCATTT 
TCCTCAATTGCGTTTGGGTGTCACTGTGTTTCGGCATCCTGGCGACCCGCTACCGCGACATCGG 
CCCGCTGCTGTTTTCCGTTGTGCAGTTGTTGTTCTTCATGACGCCGATCATCTGGAACGACGAGA 
CCCTGCGTCGGCAGGGCGCGGGCCGCTGGTCGAGCATCGTCGAGCTCAACCCGCTGCTGCAC 
30 TATCTGGACATCGTGCGGGCGCCACTGTTGGGCGCTCACCAGGAGCTGCGGCACTGGCTGGTG 
GTGCTGGTGTTGACCGTCGTCGGCTGGATGCTGGCGGCGTTCGCGATGCGGCAGTATCGCGC 
GCGGGTGCCCTACTGGGTGTAG 



>Rv3789 - TB.seq 4235371:4235733 MW:13378 
35 >emb|AL123456|MTBH37RV:4235371^235736, Rv3789 SEQ ID NO:140 

ATGCGGTTCGTTGTCACCGGCGGCCTCGCTGGGATAGTTGACTTTGGCCTCTACGTCGTGCTGT 
ACAAGGTGGCGGGCCTACAGGTCGACCTGTCCAAGGCCATCAGCTTCATCGTCGGCACCATCA 
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CCGCGTACCTGATCAACCGCCGGTGGACATTCCAGGCCGAGCCCAGCACGGCCCGATTCGTCG 
CGGTCATGCTCCTCTACGGAATCACCTTCGCCGTGCAGGTCGGACTCAACCACCTCTGCCTCGC 
ACTCTTGCACTACCGGGCGTGGGCCATCCCCGTCGCGTTTGTGATCGCGCAGGGCACCGCCAC 
GGTAATCAACTTCATCGTGCAGCGAGCCGTGATCTTCCGGATCCGCTGA 

5 

>Rv3790 - TB.seq 4235776:4237158 MW:50164 

>emb|AL123456|MTBH37RV:4235776-4237161, Rv3790 SEQ ID N0:141 

ATGTTGAGCGTGGGAGCTACCACTACCGCCACCCGGCTGACCGGGTGGGGCCGCACAGCGCC 
GTCGGTGGCGAATGTGCTTCGCACCCCAGATGCCGAGATGATCGTCAAGGCGGTGGCTCGGGT 

10 CGCCGAGTCGGGGGGCGGCCGGGGTGCTATCGCGCGCGGGCTGGGCCGCTCCTATGGGGAC 
AACGCCCAAAACGGCGGTGGGTTGGTGATCGACATGACGCCGCTGAACACTATCCACTCCATTG 
ACGCCGACACCAAGCTGGTCGACATCGACGCCGGGGTCAACCTCGACCAACTGATGAAAGCCG 
CCCTGCCGTTCGGGCTGTGGGTCCCGGTGCTGCCGGGAACCCGGCAGGTCACCGTCGGCGGG 
GCGATCGCCTGCGATATCCACGGCAAGAACCATCACAGCGCTGGCAGCTTCGGTAACCACGTG 

15 CGCAGCATGGACCTGCTGACCGCCGACGGCGAGATCCGTCATCTCACTCCGACCGGCGAGGA 
CGCCGAACTGTTCTGGGCCACCGTCGGGGGCAACGGTCTCACCGGCATCATCATGCGGGCCAC 
CATCGAGATGACGCCCACTTCGACGGCGTACTTCATCGCCGACGGCGACGTCACCGCCAGCCT 
CGACGAGACCATCGCCCTGCACAGCGACGGCAGCGAAGCGCGCTACACCTATTCCAGTGCCTG 
GTTCGACGCGATCAGCGCTCCCCCGAAGCTGGGCCGCGCGGCGGTATCGCGTGGCCGCCTGG 

20 CCACCGTCGAGCAATTGCCTGCGAAACTGCGGAGCGAACCTTTGAAATTCGATGCGCCACAGCT 
ACTTACGTTGCCCGACGTGTTTCCCAACGGGCTGGCCAACAAATATACCTTCGGCCCGATCGGC 
GAACTGTGGTACCGCAAATCCGGCACCTATCGCGGCAAGGTCCAGAACCTCACGCAGTTCTACC 
ATCCGCTGGACATGTTCGGCGAATGGAACCGCGCCTACGGCCCAGCGGGCTTCCTGCAATATC 
AGTTCGTGATCCCCACAGAGGCGGTTGATGAGTTCAAGAAGATCATCGGCGTTATTCAAGCCTC 

25 GGGTCACTACTCGTTTCTCAACGTGTTCAAGCTGTTCGGCCCCCGCAACCAGGCGCCGCTCAGC 
TTCCCCATCCCGGGCTGGAACATCTGCGTCGACTTCCCCATCAAGGACGGGCTGGGGAAGTTC 
GTCAGCGAACTCGACCGCCGGGTACTGGAATTCGGCGGCCGGCTCTACACCGCCAAAGACTCC 
CGTACCACCGCCGAAACCTTTCATGCCATGTATCCGCGCGTCGACGAATGGATCTCCGTGCGCC 
GCAAGGTCGATCCGCTGCGCGTATTCGCCTCCGACATGGCCCGACGCTTGGAGCTGCTGTAG 

30 

>Rv3791 - TB.seq 4237162:4237923 MW:27470 

>emb|AL123456|MTBH37RV:42371 62-4237926, Rv3791 SEQ ID NO:142 

ATGGTTCTTGATGCCGTAGGAAACCCCCAGACGGTGCTGCTGCTCGGTGGCACCTCCGAGATC 
GGGCTCGCCATCTGCGAGCGCTACCTGCACAATTCGGCGGCCCGCATCGTGCTGGCCTGCCTG 
35 CCCGACGACCCACGGCGGGAGGACGCGGCCGCTGCGATGAAGCAGGCCGGCGCGCGGTCGG 
TGGAGCTGATCGACTTTGACGCCCTGGATACCGACAGCCACCCGAAGATGATCGAGGCGGCCT 
TCTCCGGCGGTGATGTGGACGTGGCTATCGTCGCGTTCGGCTTGCTCGGCGACGCCGAAGAGC 
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TGTGGCAGAACCAGCGCAAGGCGGTGCAGATCGCCGAAATCAACTACACCGCAGCGGTTTCGG 
TGGGCGTGCTGCTGGCTGAGAAGATGCGCGCTCAGGGCTTCGGTCAGATCATCGCGATGAGCT 
CGGCCGCCGGTGAGCGGGTGCGACGGGCGAACTTCGTCTACGGCTCCACCAAGGCCGGTCTG 
GACGGGTTTTACCTGGGGTTGTCAGAAGCGCTGCGCGAGTACGGTGTTCGTGTGCTGGTGATC 
5 CGGCCCGGCCAGGTGCGTACCCGGATGAGCGCGCACCTCAAGGAAGCTCCATTGACCGTCGA 
CAAGGAGTACGTCGCCAACCTCGCGGTGACCGCGTCCGCAAAAGGTAAGGAATTGGTTTGGGC 
GCCAGCAGCGTTCCGCTACGTCATGATGGTGTTGCGTCACATCCCGCGGAGCATCTTCCGCAA 
GCTGCCCATCTGA 

10 >Rv3794 embA TB.seq 4243230:424651 1 MW:1 1 5694 

>emb|AL123456|MTBH37RV:4243230-4246514, embA SEQ ID NO:143 

GTGCCCCACGACGGTAATGAGCGATCTCACCGGATCGCACGCCTAGCAGCCGTCGTCTCGGGA 
ATCGCGGGTCTGCTGCTGTGCGGCATCGTTCCGCTGCTTCCGGTGAACCAAACCACCGCGACC 
ATCTTCTGGCCGCAGGGCAGCACCGCCGACGGCAACATCACCCAGATCACCGCCCCTCTGGTA 

15 TCCGGGGCGCCACGCGCGCTGGACATCTCGATCCCCTGCTCGGCCATCGCCACGCTGCCCGC 
CAACGGCGGCCTGGTGCTGTCCACACTGCCGGCCGGTGGCGTGGATACCGGTAAGGCCGGGC 
TGTTCGTCCGCGCCAACCAGGACACGGTCGTCGTGGCGTTCCGCGACTCGGTGGCCGCGGTG 
GCGGCCCGCTCCACGATCGCAGCGGGAGGCTGTAGCGCGCTGCATATCTGGGCCGATACCGG 
CGGCGCGGGCGCTGATTTTATGGGTATACCCGGCGGCGCCGGGACCCTGCCGCCGGAGAAGA 

20 AGCCACAGGTTGGCGGCATCTTCACCGACCTGAAGGTCGGAGCGCAGCCCGGGCTGTCGGCC 
CGCGTCGACATCGACACTCGGTTTATCACGACGCCCGGCGCGCTCAAGAAGGCCGTGATGCTC 
CTCGGCGTGCTGGCGGTCCTGGTAGCCATGGTGGGGCTGGCCGCGCTGGACCGGCTCAGCAG 
GGGCCGCACCCTGCGCGACTGGCTGACCCGATATCGCCCGCGGGTGCGGGTCGGATTCGCCA 
GCCGGCTCGCTGACGCAGCGGTGATCGCGACCTTGTTGCTCTGGCATGTCATCGGCGCCACCT 

25 CGTCCGATGACGGCTACCTTCTGACCGTCGCCCGGGTCGCCCCGAAGGCCGGCTATGTAGCCA 
ACTACTACCGGTATTTCGGCACGACGGAGGCGCCGTTCGACTGGTATACATCGGTGCTTGCCCA 
GCTGGCGGCGGTGAGCACCGCCGGCGTCTGGATGCGCCTGCCCGCCACCCTGGCCGGAATCG 
CCTGCTGGCTGATCGTCAGCCGTTTCGTGCTGCGGCGGCTGGGACCGGGCCCGGGCGGGCTG 
GCGTCCAACCGGGTCGCTGTGTTCACCGCTGGTGCGGTGTTCCTGTCCGCCTGGCTGCCGTTC 

30 AACAACGGCCTGCGTCCCGAGCCGCTGATCGCGCTGGGTGTGCTGGTCACGTGGGTGTTGGTG 
GAACGGTCGATCGCGCTCGGACGGCTGGCCCCGGCCGCGGTAGCCATCATCGTGGCGACGCT 
TACCGCGACGCTGGCACCGCAGGGGTTGATCGCGCTGGCCCCGCTGCTGACTGGTGCGCGCG 
CCATCGCCCAGAGGATCCGGCGCCGCCGGGCGACCGATGGACTGCTGGCGCCGCTGGCGGT 
GCTGGCCGCGGCGTTGTCGCTGATCACCGTGGTGGTGTTTCGGGACCAGACGCTGGCCACGGT 

35 GGCCGAATCGGCACGCATCAAGTACAAGGTCGGCCCGACCATCGCCTGGTACCAGGACTTCCT 
GCGCTACTACTTCCTTACCGTGGAGAGCAACGTTGAGGGGTCGATGTCCCGCCGGTTCGCGGT 
GCTGGTGTTGCTGTTCTGCCTGTTCGGGGTGCTGTTCGTGCTGCTGCGGCGCGGCCGGGTGGC 
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GGGGCTGGCCAGCGGCCCG6CCTGGCGACTGATCGGCACTACGGCGGTCGGCCTGCTGCTGC 
TCACGTTCACGCCAACCAAGTGGGCCGTGCAGTTCGGCGCATTCGCCGGGCTGGCCGGGGTGT 
TGGGTGCGGTCACCGCGTTCACCTTTGCCCGCATCGGTCTACATAGTCGACGCAACCTCACGCT 
GTACGTGACCGCGTTGCTGTTCGTGCTGGCGTGGGCAACCTCGGGCATCAACGGGTGGTTCTA 
5 CGTCGGCAACTACGGGGTGCCGTGGTATGACATCCAGCCCGTCATCGCCAGCCACCCGGTGAC 
GTCGATGTTTCTGACGCTGTCGATCCTCACCGGATTGCTGGCAGCCTGGTATCACTTCCGGATG 
GACTACGCCGGGCACACCGAAGTCAAAGACAACCGGCGCAACCGCATCTTGGCCTCTACGCCA 
CTGCTGGTGGTCGCGGTGATCATGGTCGCAGGCGAAGTCGGCTCGATGGCCAAGGCCGCGGT 
GTTCCGTTACGCGCTTTACACCACCGCCAAGGCCAACCTGACCGCGCTCAGCACCGGGCTGTC 

10 CAGCTGTGCGATGGCCGACGACGTGCTGGCCGAGCCCGACCCCAATGCCGGCATGCTGCAAC 
CGGTTCCGGGCCAGGCGTTCGGACCGGACGGACCGCTGGGCGGTATCAGTCCCGTCGGCTTC 
AAACCCGAGGGCGTGGGCGAGGACCTCAAGTCCGACCCGGTGGTCTCCAAACCCGGGCTGGT 
CAACTCCGATGCGTCGCCCAACAAACCCAACGCCGCCATCACCGACTCCGCGGGCACCGCCGG 
AGGGAAGGGCCCGGTCGGGATCAACGGGTCGCACGCGGCGCTGCCGTTCGGATTGGACCCGG 

15 CACGTACCCCGGTGATGGGCAGCTACGGGGAGAACAACCTGGCCGCCACGGCCACCTCGGCC 
TGGTACCAGTTACCGCCCCGCAGCCCGGACCGGCCGCTGGTGGTGGTTTCCGCGGCCGGCGC 
CATCTGGTCCTACAAGGAGGACGGCGATTTCATCTACGGCCAGTCCCTGAAACTGCAGTGGGG 
CGTCACCGGCCCGGACGGCCGCATCCAGCCACTGGGGCAGGTATTTCCGATCGACATCGGACC 
GCAACCCGCGTGGCGCAATCTGCGGTTTCCGCTGGCCTGGGCGCCGCCGGAGGCCGACGTGG 

20 CGCGCATTGTCGCCTATGACCCGAACCTGAGCCCTGAGCAATGGTTCGCCTTCACCCCGCCCC 
GGGTTCCGGTGCTGGAATCTCTGCAGCGGTTGATCGGGTCAGCGACACCGGTGTTGATGGACA 
TCGCGACCGCAGCCAACTTCCCCTGCCAGCGACCGTTTTCCGAGCATCTCGGCATTGCCGAGC 
TTCCGCAGTACCGGATCCTGCCGGACCACAAGCAGACGGCGGCGTCGTCGAACCTATGGCAGT 
CCAGCTCGACCGGCGGTCCGTTCCTGTTCACCCAGGCGCTGCTGCGCACCTCGACGATCGCCA 

25 CGTACCTGCGTGGGGACTGGTATCGCGACTGGGGATCGGTGGAGCAGTACCACCGGCTGGTG 
CCGGCCGATCAGGCTCCAGACGCCGTTGTCGAGGAGGGCGTGATCACTGTGCCCGGCTGGGG 
TCGGCCAGGACCGATCAGGGCGCTGCCATGA 

>Rv3795 embB TB.seq 424651 1 :4249804 M W: 1 1 8023 

30 >emb|AL123456|MTBH37RV:424651 1-4249807, embB SEQ ID NO:144 

ATGACACAGTGCGCGAGCAGACGCAAAAGCACCCCAAATCGGGCGATTTTGGGGGCTTTTGCG 
TCTGCTCGCGGGACGCGCTGGGTGGCCACCATCGCCGGGCTGATTGGCTTTGTGTTGTCGGTG 
GCGACGCCGCTGCTGCCCGTCGTGCAGACCACCGCGATGCTCGACTGGCCACAGCGGGGGCA 
ACTGGGCAGCGTGACCGCCCCGCTGATCTCGCTGACGCCGGTCGACTTTACCGCCACCGTGCC 

35 GTGCGACGTGGTGCGCGCCATGCCACCCGCGGGCGGGGTGGTGCTGGGCACCGCACCCAAG 
CAAGGCAAGGACGCCAATTTGCAGGCGTTGTTCGTCGTCGTCAGCGCCCAGCGCGTGGACGTC 
ACCGACCGCAACGTGGTGATCTTGTCCGTGCCGCGCGAGCAGGTGACGTCCCCGCAGTGTCAA 



WO 01/35317 PCT/US00/31152 

CGCATCGAGGTCACCTCTACCCACGCCGGCACCTTCGCCAACTTCGTCGGGCTCAAGGACCCG 
TCGGGCGCGCCGCTGCGCAGCGGCTTCCCCGACCCCAACCTGCGCCCGCAGATTGTCGGGGT 
GTTCACCGACCTGACCGGGCCCGCGCCGCCCGGGCTGGCGGTCTCGGCGACCATCGACACCC 
GGTTCTCCACCCGGCCGACCACGCTGAAACTGCTGGCGATCATCGGGGCGATCGTGGCCACCG 
5 TCGTCGCACTGATCGCGTTGTGGCGCCTGGACCAGTTGGACGGGCGGGGCTCAATTGCCCAGC 
TCCTCCTCAGGCCGTTCCGGCCTGCATCGTCGCCGGGCGGCATGCGCCGGCTGATTCCGGCAA 
GCTGGCGCACCTTCACCCTGACCGACGCCGTGGTGATATTCGGCTTCCTGCTCTGGCATGTCAT 
CGGCGCGAATTCGTCGGACGACGGCTACATCCTGGGCATGGCCCGAGTCGCCGACCACGCCG 
GCTACATGTCCAACTATTTCCGCTGGTTCGGCAGCCCGGAGGATCCCTTCGGCTGGTATTACAA 

10 CCTGCTGGCGCTGATGACCCATGTCAGCGACGCCAGTCTGTGGATGCGCCTGCCAGACCTGGC 
CGCCGGGCTAGTGTGCTGGCTGCTGCTGTCGCGTGAGGTGCTGCCCCGCCTCGGGCCGGCGG 
TGGAGGCCAGCAAACCCGCCTACTGGGCGGCGGCCATGGTCTTGCTGACCGCGTGGATGCCG 
TTCAACAACGGCCTGCGGCCGGAGGGCATCATCGCGCTCGGCTCGCTGGTCACCTATGTGCTG 
ATCGAGCGGTCCATGCGGTACAGCCGGCTCACACCGGCGGCGCTGGCCGTCGTTACCGCCGC 

15 ATTCACACTGGGTGTGCAGCCCACCGGCCTGATCGCGGTGGCCGCGCTGGTGGCCGGCGGCC 
GCCCGATGCTGCGGATCTTGGTGCGCCGTCATCGCCTGGTCGGCACGTTGCCGTTGGTGTCGC 
CGATGCTGGCCGCCGGCACCGTCATCCTGACCGTGGTGTTCGCCGACCAGACCCTGTCAACGG 
TGTTGGAAGCCACCAGGGTTCGCGCCAAAATCGGGCCGAGCCAGGCGTGGTATACCGAGAACC 
TGCGTTACTACTACCTCATCCTGCCCACCGTCGACGGTTCGCTGTCGCGGCGCTTCGGCI I I I I 

20 GATCACCGCGCTATGCCTGTTCACCGCGGTGTTCATCATGTTGCGGCGCAAGCGAATTCCCAGC 
GTGGCCCGCGGACCGGCGTGGCGGCTGATGGGCGTCATCTTCGGCACCATGTTCTTCCTGATG 
TTCACGCCCACCAAGTGGGTGCACCACTTCGGGCTGTTCGCCGCCGTAGGGGCGGCGATGGC 
CGCGCTGACGACGGTGTTGGTATCCCCATCGGTGCTGCGCTGGTCGCGCAACCGGATGGCGTT 
CCTGGCGGCGTTATTCTTCCTGCTGGCGTTGTGTTGGGCCACCACCAACGGCTGGTGGTATGTC 

25 TCCAGCTACGGTGTGCCGTTCAACAGCGCGATGCCGAAGATCGACGGGATCACAGTCAGCACA 
ATCTTTTTCGCCCTGTTTGCGATCGCCGCCGGCTATGCGGCCTGGCTGCACTTCGCGCCCCGC 
GGCGCCGGCGAAGGGCGGCTGATCCGCGCGCTGACGACAGCCCCGGTACCGATCGTGGCCG 
GTTTCATGGCGGCGGTGTTCGTCGCGTCCATGGTGGCCGGGATCGTGCGACAGTACCCGACCT 
ACTCCAACGGCTGGTCCAACGTGCGGGCGTTTGTCGGCGGCTGCGGACTGGCCGACGACGTA 

30 CTCGTCGAGCCTGATACCAATGCGGGTTTCATGAAGCCGCTGGACGGCGATTCGGGTTCTTGG 
GGCCCCTTGGGCCCGCTGGGTGGAGTCAACCCGGTCGGCTTCACGCCCAACGGCGTACCGGA 
ACACACGGTGGCCGAGGCGATCGTGATGAAACCCAACCAGCCCGGCACCGACTACGACTGGGA 
TGCGCCGACCAAGCTGACGAGTCCTGGCATCAATGGTTCTACGGTGCCGCTGCCCTATGGGCT 
CGATCCCGCCCGGGTACCGTTGGCAGGCACCTACACCACCGGCGCACAGCAACAGAGCACACT 

35 CGTCTCGGCGTGGTATCTCCTGCCTAAGCCGGACGACGGGCATCCGCTGGTCGTGGTGACCGC 
CGCGGGCAAGATCGCCGGCAACAGCGTGCTGCACGGGTACACCCCCGGGCAGACTGTGGTGC 
TCGAATACGCCATGCCGGGACCCGGAGCGCTGGTACCCGCCGGGCGGATGGTGCCCGACGAC 
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CTATACGGAGAGCAGCCCAAGGCGTGGCGCAACCTGCGCTTCGCCCGAGCAAAGATGCCCGC 
CGATGCCGTCGCGGTCCGGGTGGTGGCCGAGGATCTGTCGCTGACACCGGAGGACTGGATCG 
CGGTGACCCCGCCGCGGGTACCGGACCTGCGCTCACTGCAGGAATATGTGGGCTCGACGCAG 
CCGGTGCTGCTGGACTGGGCGGTCGGTTTGGCCTTCCCGTGCCAGCAGCCGATGCTGCACGC 
5 CAATGGCATCGCCGAAATCCCGAAGTTCCGCATCACACCGGACTACTCGGCTAAGAAGCTGGAC 
ACCGACACGTGGGAAGACGGCACTAACGGCGGCCTGCTCGGGATCACCGACCTGTTGCTGCG 
GGCCCACGTCATGGCCACCTACCTGTCCCGCGACTGGGCCCGCGATTGGGGTTCCCTGCGCAA 
GTTCGACACCCTGGTCGATGCCCCTCCCGCCCAGCTCGAGTTGGGCACCGCGACCCGCAGCG 
GCCTGTGGTCACCGGGCAAGATCCGAATTGGTCCATAG 

>Rv3834c serS seryMRNA synthase TB.seq 4307655:430891 1 MW:45293 
>emb|AL123456|MTBH37RV:c430891 1-4307652, serS SEQ ID NO:145 

GTGATCGACCTGAAGCTGCTTCGTGAAAACCCCGACGCGGTACGCCGCTCACAACTCAGCCGC 
GGCGAGGACCCGGCGCTGGTAGATGCCCTGCTGACGGCCGACGCCGCCCGCCGGGCCGTGA 

15 TCTCGACCGCCGATTCGTTACGGGCCGAGCAGAAAGCCGCCAGCAAAAGCGTGGGTGGCGCG 
TCTCCCGAAGAGCGCCCGCCGCTGCTGCGGCGCGCGAAGGAACTCGCCGAGCAGGTCAAAGC 
CGCTGAGGCCGACGAGGTCGAAGCGGAGGCGGCGTTCACCGCGGCGCACCTGGCGATCTCGA 
ATGTCATCGTGGACGGGGTACCCGCCGGCGGGGAGGACGACTACGCGGTGCTCGACGTCGTC 
GGCGAGCCCAGCTACCTCGAGAACCCCAAGGACCACCTGGAGCTCGGCGAGTCGCTGGGCCT 

20 GATCGACATGCAGCGCGGCGCCAAGGTGTCGGGTTCACGGTTCTACTTCCTGACCGGTCGGGG 
TGCCCTACTGCAGCTTGGATTGCTGCAGCTGGCGCTGAAGCTAGCCGTCGACAACGGCTTTGTC 
CCTACGATCCCGCCGGTGCTGGTGCGCCCGGAAGTGATGGTAGGCACGGGATTTCTAGGCGCC 
CACGCCGAGGAGGTGTACCGGGTAGAGGGCGACGGCCTCTACCTTGTGGGCACCTCCGAGGT 
ACCGCTGGCGGGGTATCACTCCGGCGAGATTCTGGACCTTTCCCGCGGGCCGCTGCGGTATGC 

25 GGGCTGGTCGTCGTGTTTCCGACGTGAGGCCGGCAGCCATGGCAAGGACACGCGCGGCATCA 
TCCGGGTGCACCAGTTCGACAAAGTCGAGGGCTTCGTCTACTGCACACCGGCCGACGCGGAGC 
ACGAACATGAGCGGCTGCTGGGCTGGCAGCGCCAGATGCTGGCACGCATCGAGGTGCCGTAT 
CGGGTCATCGACGTGGCCGCGGGTGATCTCGGCTCGTCGGCCGCCCGCAAGTTCGACTGCGA 
GGCGTGGATTCCGACGCAGGGGGCCTATCGCGAGCTGACGTCGACGTCGAACTGCACCACCTT 

30 TCAGGCGCGCCGGTTGGCGACCCGCTACCGGGATGCCAGCGGCAAGCCGCAGATCGCGGCCA 
CCCTCAACGGAACGCTGGCCACCACCCGGTGGCTGGTTGCGATCCTGGAGAACCACCAGCGG 
CCCGACGGCAGCGTTAGAGTCCCGGACGCACTGGTTCCGTTCGTGGGTGTCGAAGTGCTGGAG 
CCGGTCGCTTAG 

35 >Rv3907c pcnA polynucleotide polymerase TB.seq 4391 631 :4393070 MW:53057 
>emb|AL123456|MTBH37RV:c4393070-4391628, pcnA SEQ ID NO:146 
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GTGCCGGAAGCCGTCCAGGAAGCCGATCTGCTAACCGCCGCTGCGGTTGCCTTGAACAGGCAT 
GCTGCCTTATTGCGGGAACTCGGGTCGGTGTTCGCCGCCGCGGGACACGAGTTGTATCTGGTC 
GGCGGTTCGGTGCGAGATGCACTGTTGGGCCGGTTGAGCCCCGACCTGGACTTCACCACCGAC 
GCCCGTCCCGAGCGGGTGCAGGAGATCGTGCGGCCGTGGGCCGATGCGGTGTGGGATACCG 
5 GAATCGAATTCGGCACCGTCGGCGTGGGTAAGAGCGACCACCGCATGGAGATCACCACATTCC 
GTGCCGACAGCTACGACCGGGTTTCGCGTCATCCAGAGGTACGTTTCGGCGATTGCCTCGAGG 
GCGATCTGGTCCGCCGCGACTTCACCACGAACGCAATGGCTGTGCGCGTCACCGCCACTGGGC 
CGGGCGAATTCCTGGATCCGCTTGGTGGCTTGGCGGCGCTGCGGGCCAAGGTGTTAGACACCC 
CGGCGGCGCCGTCGGGGTCCTTTGGCGACGATCCGTTGCGGATGCTGCGCGCCGCGCGGTTC 

10 GTCTCGCAACTTGGATTCGCGGTGGCGCCGCGGGTGCGCGCGGCGATCGAAGAGATGGCGCC 
GCAGTTGGCCCGAATCAGCGCCGAACGGGTGGCCGCCGAGCTGGACAAGCTGCTGGTCGGTG 
AGGATCCGGCCGCGGGTATCGACCTGATGGTGCAGAGCGGTATGGGTGCTGTGGTCTTGCCTG 
AAATCGGTGGGATGCGGATGGCGATCGACGAACATCACCAGCACAAGGACGTCTATCAGCATTC 
CTTGACCGTGCTGCGGCAGGCGATCGCGCTGGAGGACGACGGCCCGGATCTGGTGTTGCGCT 

15 GGGCGGCGCTGCTGCACGACATCGGCAAGCCCGCCACCCGCCGTCACGAACCCGACGGTGGG 
GTGAGCTTCCATCACCACGAAGTGGTCGGCGCCAAGATGGTGCGCAAGCGGATGCGGGCGCT 
GAAGTATTCCAAGCAGATGATCGACGACATCTCGCAGCTGGTCTACCTGCATCTGCGGTTTCAC 
GGCTACGGCGATGGGAAATGGACCGACTCTGCGGTGCGCCGCTATGTCACCGACGCCGGGGC 
CCTACTGCCACGGCTGCACAAGCTGGTGCGCGCCGACTGCACGACCCGCAACAAGCGCCGGG 

20 CCGCGCGGTTGCAGGCCAGTTACGACCGGCTGGAAGAGCGGATCGCGGAGCTGGCCGCCCAG 
GAGGATCTGGATCGGGTGCGCCCCGACCTGGACGGCAACCAGATCATGGCGGTGCTCGACATT 
CCGGCGGGCCCGCAAGTCGGCGAGGCGTGGCGCTACTTGAAGGAGCTGCGGCTAGAGCGCG 
GCCCGTTGTCCACCGAGGAGGCGACAACCGAGCTGCTGTCCTGGTGGAAATCACGGGGGAAC 
CGCTAG 

25 

TABLE 4 

>Rv0002 dnaN DNA polymerase III, b-subunit TB.seq 2052:3257 MW:421 14 SEQ ID NO:147 
MDAATTRVGLTDLTFRLLRESFADAVSWVAKNLPARPAVPVLSGVLLTGSDNGLTISGFDYEVSAEA 
QVGAEIVSPGSVLVSGRLLSDITRALPNKPVDVHVEGNRVALTCGNARFSLPTMPVEDYPTLPTLPEE 
30 TGLLPAELFAEAISQVAIAAGRDDTLPMLTGIRVEILGETWLAATDRFRLAVRELKWSASSPDIEAAVL 
VPAKTLAEAAKAGIGGSDVRLSLGTGPGVGKDGLLGISGNGKRSTTRLLDAEFPKFRQLLPTEHTAVA 
TMDVAELIEAIKLVALVADRGAQVRMEFADGSVRLSAGADDVGRAEEDLWDYAGEPLTIAFNPTYLT 
DGLSSLRSERVSFGFTTAGKPALLRPVSGDDRPVAGLNGNGPFPAVSTDYVYLLMPVRLPG 

35 >Rv0003 recF DNA replication and SOS induction TB.seq 3280:4434 MW:421 81 SEQIDNO:148 
\m/RHLGLRDFRSWACVDLELHPGRWFVGPNGYGKTNLIEALWYSTTLGSHRVSADLPLIRVGTDR 
AVISTIWNDGRECAVDLEIATGRVNKARLNRSSVRSTRDWGVLRAVLFAPEDLGLVRGDPADRRR 
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YLDDLAIVRRPAIAAVRAEYERVLRQRTALLKSVPGARYRGDRGVFDTLEVWDSRLAEHGAELVAARI 
DLVNQLAPEVKKAYQLLAPESRSASIGYRASMDVTGPSEQSDIDRQLLAARLLAALAARRDAELERG 
VCLVGPHRDDLILRLGDQPAKGFASHGEAWSLAVALRLAAYQLLRVDGGEPVLLLDDVFAELDVMRR 
RALATAAESAEQVLVTAAVLEDIPAGWDARRVHIDVRADDTGSMSWLP 

5 

>Rv0005 gyrB DNA gyrase subunit B TB.seq 5123:7264 MW:78441 SEQ ID NO:149 
MGKNEARRSALAPDHGTWCDPLRRLNRMHATPEESIRIVAAQKKKAQDEYGAASITILEGLEAVRKR 
PGMYIGSTGERGLHHLIWEWDNAVDEAMAGYATWNWLLEDGGVEVADDGRGIPVATHASGIPTV 
DWMTQLHAGGKFDSDAYAISGGLHGVGVSWNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGA 
10 PTKKTGSTVRFWADPAVFETTEYDFEWARRLQEMAFLNKGLTINLTDERVTQDEWDEWSDVAEA 
PKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAG 
YSESVHTFANTINTHEGGTHEEGFRSALTSWNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSE 
PQFEGQTKTKLGNTEWSFVQKVCNEQLTHWFEANPTDA 

SATDIGGLPGKLADCRSTDPRKSELYWEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLK 
15 NTEVC^IITALGTGIHDEFDIGKLRYHKIN^MADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLY 
KLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQ 
VTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV 

>Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 SEQIDNO:150 
20 MTDTTLPPDDSLDRIEPVDIEQEMQRSYIDYAMSVIVGRALPEVRDGLKPVHRRVLYAMFDSGFRPD 
RSHAKSARSVAETMGNYHPHGDASIYDSLVRMAQPWSLRYPLVDGQGNFGSPGNDPPAAMRYTEA 
RLTPI^MEMLREIDEEWDFIPNYDGRVQEPTVLPSRFPNLI_ANGSGGIAVGMATNIPPHNLREI_ADA 
VFWALENHDADEEETLAAVMGRVKGPDFPTAGLIVGSQGTADAYKTGRGSIRMRGWEVEEDSRG 
RTSLVITELPYQVNHDNFITSIAEQVRDGKLAGISNIEDQSSDRVGLRIVIEIKRDAVAKWINNLYKHTQ 
25 LQTSFGANMU\IVDGVPRTLRLDQLIRYWDHQLDVIVRRTTYRLRKANERAHILRGLVKALDALDEVI 
ALI RASETVDI ARAGLIELLDI DEIQAQAILDMQLRRLAALERQRI I DDLAKI EAEIADLEDILAKPERQRGI 
VRDELAEIVDRHGDDRRTRIIAADGDVSDEDLIAREDWVTITETGYAKRTKTDLYRSQKRGGKGVQG 
AGLKQDDIVAHFFVCSTHDLILFFTTQGRWRAKAYDLPEASRTARGQHVANLLAFQPEERIAQVIQIR 
GYTDAPYLVLATRNGLVKKSKLTDFDSNRSGGIVAVNLRDNDELVGAVLCSAGDDLLLVSANGQSIR 
30 FSATDEALRPMGRATSGVQGMRFNIDDRLLSLNNA/REGTYLLVATSGGYAKRTAIEEYPVQGRGGK 
GVLWMYDRRRGRLVGALIVDDDSELYAVTSGGGVIRTAARQVRKAGRQTKGVRLMNLGEGDTLLAI 
ARNAEESGDDNAVDANGADQTGN 

>Rv0014c pknB serine-threonine protein kinase TB.seq 15593:17470 MW:6651 1 SEQ ID NO:151 
35 MTTPSHLSDRYELGEILGFGGMSEVHLARDLRLHRDVAVKVLRADLARDPSFYLRFRREAQNAAALN 
HPAIVAWDTGEAETPAGPLPYIVMEWDGmRDIVHTEGPMTPKRAIEVIADACQALNFSHQNGIIH 
RDVKPANIMISATNAVKVMDFGIARAIADSGNSVTQTAAVIGTAQYLSPEQARGDSVDARSDVYSLGC 
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VLYEVLTGEPPFTGDSPVSVAYQHVREDPIPPSARHEGLSADLDAWLKALAKNPENRYQTAAEMRA 
DLVRVHNGEPPEAPKVLTDAERTSLLSSAAGNLSGPRTDPLPRQDLDDTDRDRSIGSVGRWVAWA 
VI^VLTVVVTIAINTFGGITRDVQVPDVRGQSSADAIATLQNRGFKIRTLQKPDSTIPPDHVIGTDPAAN 
TSVSAGDEITVNVSTGPEQREIPDVSTLTYAEAVKKLTAAGFGRFKQANSPSTPELVGKV1GTNPPAN 
5 QTSAITNWIIIVGSGPATKDIPDVAGQTVDVAQKNLNVYGFTKFSQASVDSPRPAGEVTGTNPPAGT 
WPVDSVIELQVSKGNQFVMPDLSGMFWVDAEPRLRALGWTGMLDKGADVDAGGSQHNRWYQN 
PPAGTGVNRDGI ITLRFGQ 



>Rv0016c pbpA TB.seq 18762:20234 MW:51577 SEQ ID NO:152 

1 0 MNASLRRISVTVMALIVLLLLNATMTQVFTADGLRADPRNQRVLLDEYSRQRGQITAGGQLLAYSVAT 
DGRFRFLRVYPNPEVYAPVTGFYSLRYSSTALERAEDPILNGSDRRLFGRRLADFFTGRDPRGGNV 
DTTINPRIQQAGWDAMQQGCYGPCKGAWALEPSTGKILALVSSPSYDPNLLASHNPEVQAQAWQR 
LGDNPASPLTNRAISETYPPGSTFKVITTAAALAAGATETEQLTAAPTIPLPGSTAQLENYGGAPCGDE 
PTVSLREAFVKSCNTAFVQLGIRTGADALRSMARAFGLDSPPRPTPLQVAESTVGPIPDSAALGMTSI 

1 5 GQKDVALTPLANAEIAATI ANGGITMRPYLVGSLKGPDLANISTTVGYQQRRAVSPQVAAKLTELMVG 
AEKVAQQKGAIPGVQIASKTGTAEHGTDPRHTPPHAWYIAFAPAQAPKVAVAVLVENGADRLSATGG 
ALAAPIGRAVIEAALQGEP 

>Rv0017c rodA TB.seq 20234:21640 MW:50612 SEQ ID NO:153 

20 MTTRLCMPVAWPPLPTRRNAELLLLCFAAVITFAALLWQANQDQGVPWDLTSYGLAFLTLFGSAHL 
AIRRFAPYTDPLLLPWALLNGLGLVMIHRLDLVDNEIGEHRHPSANQQMLWTLVGVAAFALVVTFLK 
DHRQLARYGYICGLAGLVFLAVPALLPAALSEQNGAKIWIRLPGFSIQPAEFSKILLLIFFSAVLVAKRG 
LFTSAGKHLLGMTLPRPRDLAPLU^WVISVGVMVFEKDLGASLaYTSFL\A/\AlATQRFSWWIGL 
TLFAAGTLVAYFIFEHVRLRVQTWLDPFADPDGTGYQIVQSLFSFATGGIFGTGLGNGQPDTVPAAST 

25 DFIIAAFGEELGLVGLTAILMLYTIVIIRGLRTAIATRDSFGKLLAAGLSSTLAIQLFIWGGVTRLIPLTGLT 
TPWMSYGGSSLLANYILLAILARISHGARRPLRTRPRNKSPITAAGTEVIERV 



>Rv0018c ppp TB.seq 21640:23181 MW:53781 SEQ ID NO: 154 

VARWLVLRYAARSDRGLVRANNEDSVYAGARLLALADGMGGHAAGEVASQLVIAALAHLDDDEPG 
30 GDLLAKLDAAVRAGNSAIAAQVEMEPDLEGMGTTLTAILFAGNRLGLVHIGDSRGYLLRDGELTQITK 
DDTFVQTLVDEGRITPEEAHSHPQRSLIMRALTGHEVEPTLTMREARAGDRYLLCSDGLSDPVSDETI 
LEALQIPEVAESAHRLIELALRGGGPDNVTVWADWDYDYGQTQPILAGAVSGDDDQLTLPNTAAG 
RASAISQRKEIVKRVPPQADTFSRPRWSGRRLAFWALVTVLMTAGLLIGRAIIRSNYYVADYAGSVSI 
MRGIQGSLLGMSLHQPYLMGCLSPRNELSQISYGQSGGPLDCHLMKLEDLRPPERAQVRAGLPAGT 
35 LDDAIGQLRELAANSLLPPCPAPRATSPPGRPAPPTTSETTEPNVTSSPASPSPTTSAPAPTGTTPAIP 
TSASPAAPASPPTPWPVTSSPTMAALPPPPPQPGIDCRAAA 
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>Rv0019c - TB.seq 23273:23737 MW:17153 SEQ ID NO:155 
MQGLVLQLTRAGFLMUJWFIWSVLRIL^ 

EGALTGARITLSEQPVLIGRADDSTLVLTDDYASTRHARLSMRGSEWWEDL^ 
AVRVPI GTPVRIGKTAI ELRP 

>Rv0020c - TB.seq 23864:25444 MW:56881 SEQ ID NO:156 

MGSQKRLVQRVERKLEQTVGDAFARIFGGSIVPQEVEALLRREAADGIQSLQGNRLLAPNEYIITLGV 

HDFEKLGADPELKSTGFARDLADYIQEQGWQTYGDNAA^FEQSSNLHTGQFF^RGWNPDVETHP 

PVIDCARPQSNHAFGAEPGVAPMSDNSSYRGGQGQGRPDEYYDDRYARPQEDPRGGPDPQGGS 

DPRGGYPPETGGYPPQPGYPRPRHPDQGDYPEQIGYPDQGGYPEQRGYPEQRGYPDQRGYQDQ 

GRGYPDQGQGGYPPPYEQRPPVSPGPAAGYGAPGYDQGYRQSGGYGPSPGGGQPGYGGYGEY 

GRGPARHEEGSYVPSGPPGPPEQRPAYPDQGGYDQGYQQGATTYGRQDYGGGADYTRYTESPR 

VPGYAPQGGGYAEPAGRDYDYGQSGAPDYGQPAPGGYSGYGQGGYGSAGTSVTLQLDDGSGRT 

YQLREGSNIIGRGQDAQFRLPDTGVSRRHLEIRWDGQVALLADLNSTNGTTVNNAPVQEWQLADGD 

VIRLGHSEIIVRMH 

>Rv0032 bioF2 C-tenminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
SEQ ID NO:157 

MPTGLGYDFLRPVEDSGINDLKHYYFMADLADGQPLGRANLYSVCFDLATTDRKLTPAWRTTIKRWF 

PGF^FRFLECGLLTMVSNPLALRSDTDLERVLPVLAGQMDQLAHDDGSDFLMIRDVDPEHYQRYL 

DILRPLGFRPALGFSRVDTTISWSSVEEALGCLSHKRRLPLKTSLEFRERFGIEVEELDEYAEHAPVLA 

RLWRNVKTEAKDYQREDLNPEFFAACSRHLHGRSRLWLFRYQGTPIAFFLNVWGADENYILLEWGI 

DRDFEH YRKANLYRAALM LSLKDAI SRDKRRM EMGITN YFTKLRI PGARVI PTI YFLRHSTDPVHTATL 

ARMMMHNIQRPTLPDDMSEEFCRWEERIRLDQDGLPEHDIFRKIDRQHKYTGLKLGGVYGFYPRFT 

GPQRSTVKAAELGEIVLLGTNSYLGLATHPEWEASAEATRRYGTGCSGSPLLNGTLDLHVSLEQEL 

ACFLGKPAAVLCSTGYQSNLAAISALCESGDMIIQDALNHRSLFDAARLSGADFTLYRHNDMDHLARV 

LRRTEGRRRIIWDAVFSMEGTVADLATIAELADRHGCRVYVDESHALGVLGPDGRGASAALGVLAR 

MDWMGTFSKSFASVGGFIAGDRPWDYIRHNGSGHVFSASLPPAAAAATHAALRVSRREPDRRAR 

VL^AAEYMATGLARQGYQAEYHGTAIVPVILGNPWAHAGYLRLMRSGVWNPVAPPAVPEERSGFR 

TSYLADHRQSDLDRALHVFAGLAEDLTPQGAAL 

>Rv0050 ponA1 TB.seq 53661:55694 MW:71 119 SEQIDNO:158 

WILLPMVTFTMAYLIVDVPKPGDIRTNQVSTILASDGSEIAKIVPPEGNRVDVNLSQVPMHVRQAVIAA 

EDRNFYSNPGFSFTGFARAVKNNLFGGDLQGGSTITQQYVKNALVGSAQHGWSGLMRKAKELVIAT 

KMSGEWSKDDVLQAYLNIIYFGRGAYGISAASKAYFDKPVEQLTVAEGALLAALIRRPSTLDPAVDPE 

GAHARWNV\M_DGMVETKALSPNDRAAQVFPEWPPDLARAENQTKGPNGLIERQVTRELLELFNI^ 

EQTLNTQGLVVTTTIDPQAQRAAEKAVAKYLDGQDPDMRAAWSIDPHNGAVRAYYGGDNANGFDF 
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AQAGLQTGSSFKVFALVAALEQGIGLGYQVDSSPLTVDGIKITNVEGEGCGTCNIAEALKMSLNTSYY 

RLMLKLNGGPC^VADAAHQAGIASSFPGVAHTLSEDGKGGPPNNGIVLGQYQTRVIDMASAYATLAA 

SGIYHPPHFVQKWSANGQVLFDASTADNTGDQRIPKAVADNVTAAMEPIAGYSRGHNLAGGRDSA 

AKTGTTQFGDTTANKDAWMVGYTPSLSTAVWVGTWGDEPLVTASGAAIYGSGLPSDIWK^ 

LKGTSNETFPKPTEVGGYAGVPPPPPPPEVPPSEWIQPWEIAPGITIPIGPPTTITLAPPPPAPPAAT 

PTPPP 

>Rv0051 -TB.seq 55694:57373 MW:61210 SEQIDNO:159 

VTGALSQSSNISPLPL^ADLRSADNRDCPSRTDVLGAALANWGGPVGRHALIGRTRLMTPLRVMFAI 

ALVFLALGWSTKMCLQSTGTGPGDQRVANWDNQRAYYQLCYSDTVPLYGAELLSQGKFPYKSSWI 

ETDSNGTPQLRYDGQIAVRYMEYPN^TGIYQYLSMAIAKTYTALSKVAPLPWAEWMFFNV 

LAWLTTVWATSGLAGRRI WDAALVAASPLVI FQI FTNFDALATGLATSGLLAWARRRPVLAGVLIGLG 

SMKLYPLLFLYPLLLLGIRAGRLNALART^MAAMTWLLVNLPVMLLFPRGWSEFFRLNTRRGDDM 

DSLYNWKSFTGWRGFDPTLGFWEPPLVLNTWTLLF^ 

LVNKVWSPQFSLWLVPI^VLALPHRRILLAWMTIDALVVVVPRMYYLYGNPS 

IAVMVLCGLWWQIYRPGRDLVRTGGPGALPACGGVDDPVGGVFANAADAPPGRLPSWLRPRLGD 
EHARERTPDAGRDRTFSGQHRA 

>Rv0106 - TB.seq 124372:125565 MW:43701 SEQ ID NO:160 

MRTPVILVAGQDHTDENTTGALLRRTGTWVEHRFDGHWRRMTATLSRGELITTEDALEFAHGCVSC 

TIRDDLLVLLRRLHRRDNVGRIWHU^PWLEPQPICWAIDHVRVCVGHGYPDGPAALDVRVAAVVTC 

VDCVRWLPQSLGEDELPDGRWAQVWGC^EFADLLVLTHPEPVAVAWRRLAPRARITGGVDRVEL 

ALAHLDDNSRRGRTDTPHTPLLAGLPPUVADGEVAIVEFSARRPFHPQRLHAAVDLLLDGWRTRGR 

LWUVNRPDQVMWLESAGGGLRVASAGKWLAAMAASEVAWDLERRLFADLMWVYPFGDRHTAMT 

VLVCGADPTDIVNALNAALLSDDEMASPQRWQSYVDPFGDWHDDPCHEMPDAAGEFSAHRNSGES 

R 

>Rv0125 -TB.seq 151146:152210 MW:34927 SEQIDNO:161 

MSNSRRRSLRWSWLLSVLAAVGLGLATAPAQAAPPALSQDRFADFPALPLDPSAMVAQVGPQWNI 
NTKLGYNNAVGAGTGIVIDPNGWLTNNHVIAGATDINAFSVGSGQTYGVDWGYDRTQDVAVLQLR 
GAGGLPSAAIGGGVAVGEPWAMGNSGGQGGTPRAVPGRWALGQTVQASDSLTGAEETLNGLIQ 
FDAAIQPGDSGGPWNGLGQWGMNTAASDNFQLSQGGQGFAIPIGQAMAIAGQIRSGGGSPTVHI 
GPTAFLGLGWDNNGNGARVQRWGSAPAASLGISTGDVITAVDGAPINSATAMADALNGHHPGDVI 
SVTWQTKSGGTRTGNVTLAEGPPA 

>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 41 9833:421 707 
MW:66832 SEQ ID NO: 162 
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MARAVGIDLGTTNSWSVLEGGDPNA/VANSEGSRTTPSIVAFARNGEVLVGQPAKNC^VTIWDRW 
RSVKRHMGSDWSIEIDGKKYTAPEISARILMKLKRDAEAYLGEDITDAV1TTPAYFNDAQRQATKDAG 
QIAGLNVLRIVNEPTAAALAYGLDKGEKEQRILVFDLGGGTFDVSLLEIGEGWEVRATSGDNHLGGD 
DWDQRWDWLVDKFKGTSGIDLTKDKMAMQRLREAAEKAKIELSSSQSTSINLPYITVDADKNPLFLD 
5 EQLTRAEFQRITQDLLDRTRKPFQSVIADTGISVSEIDHNA/LVGGSTRMPAVTDLVKELTGGKEPNKG 
VNPDEWAVGAALQAGVLKGEVKDVLLLDVTPLSLGIETKGGVMTRLIERNTTIPTKRSETFTTADDN 
QPSVQIQVYQGEREIAAHNKLLGSFELTG1PPAPRGIPQIEVTFDIDANGIVHVTAKDKGTGKENTIRIQ 
EGSGLSKEDIDRMIKDAEAHAEEDRKRREEADVRNQAETLVYQTEKFVKEQREAEGGSKVPEDTLN 
KVDAAVAEAKAALGGSDISAIKSAMEKLGQESQALGQAIYEAAQAASQATGAAHPGGEPGGAHPGS 
10 ADDWDAEWDDGREAK 

>Rv0351 grpE stimulates DnaKATPase activity TB.seq 421707:42241 1 MW:24501 
SEQIDNO:163 

VTDGNQKPDGNSGEQVTWDKRRIDPETGEVRHVPPGDMPGGTAAADAAHTEDKVAELTADLQRV 
15 QADFANYRKRALRDQQAAADRAKASWSQLLGVLDDLERARKHGDLESGPLKSVADKLDSALTGLG 
LVAFGAEGEDFDPVLHEAVQHEGDGGQGSKPVIGTVMRQGYQLGEQVLRHALVGWDTVWDAAE 
LESVDDGTAVADTAENDQADQGNSADTSGEQAESEPSGS 

>Rv0352 dnaJ acts with GrpE to stimulate DnaK ATPase TB.seq 422450:423634 MW:41 346 
20 SEQ ID NO:164 

MAQREWVEKDFYQELGVSSDASPEEIKRAYRKLARDLHPDANPGNPAAGERFKAVSEAHNVLSDPA 
KRKEYDETRRLFAGGGFGGRRFDSGFGGGFGGFGVGGDGAEFNLNDLFDAASRTGGTTIGDLFGG 
LFGRGGSARPSRPRRGNDLETETELDFVEAAKGVAMPLRLTSPAPCTNCHGSGARPGTSPKVCPTC 
NGSGVINRNQGAFGFSEPCTDCRGSGSIIEHPCEECKGTGVTTRTRTINVRIPPGVEDGQRIRLAGQ 
25 GEAGLRGAPSGDLYVWHVRPDKIFGRDGDDLTVWPVSFTELALGSTLSVPTLDGWGVRVPKGTA 
DGRILRVRGRGVPKRSGGSGDLLVTVKVAVPPNLAGAAQEALEAYAAAERSSGFNPRAGWAGNR 
>Rv0363c fba fructose bisphosphate aldolase TB.seq 441266:442297 MW:36545 
SEQ ID NO: 165 

MPIATPEVYAEMLGQAKQNSYAFPAINCTSSETVNAAIKGFADAGSDGIIQFSTGGAEFGSGLGVKDM 
30 WGAVALAEFTHVIAAKYPVNVALHTDHCPKDKLDSYVRPLLAISAQRVSKGGNPLFQSHMWDGSAV 
PIDENLAIAQELLKAAAAAKIILEIEIGWGGEEDGVANEINEKLYTSPEDFEKTIEALGAGEHGKYLLAA 
TFGNVHGVYKPGrWKLRPDILAQGQQVAAAKLGLPADAKPFDFVFHGGSGSLKSEIEEALRYGWKM 
NVDTDTQYAFTRPIAGHMFTNYDGVLKVDGEVGVKKVYDPRSYLKKAEASMSQRWQACNDLHCA 
GKSLTH 

35 >Rv0405 pks6 TB.seq 485729:489934 MW:147615 SEQ ID NO:166 

MTDGSVTADKLQKWFREYLSTHIECHPNEVSLDVPIRDLGLKSIDVI^IPGDLGDRFGFCIPDLA\A/VD 
N PSAN DLI DSLLNQRSADSLRESHGH ADRNTQG RGS I N EPVAVIGVGCRFPG Dl DGPERLWDFLTEK 
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KCAITAYPDRGFTNAGTFAESGGFLKDVAGFDNRFFDIPPDEALRMDPQQRLLLEVSWEALEHAGIIP 
ESLRLSRTGVFVGVSSTDYVRLVSASAQQKSTIWDNTGGSSSIIANRISYFLDIQGPSIVIDTACSSSLV 
AVHLACRSLSTWDCDIALVGGTNVLISPEPWGGFREAGILSQTGCCHAFDKSADGMVRGEGCGVIVL 
QRLSDARLEGRRILAILTGSAVNQDGKSNGIMAPNPSAQIGVLENACKSARVDPLEIGYVEAHGTGTS 
5 LGDRIEAHALGMVFGRKRPGSGPLMIGSIKPNIGHLEGAAGIAGLIKAVLMVERGSLLPSGGFTEPNP 
Al PFTELGLRWDELQEWP WAGRPRRAGVSSFGFGGTN AHVI VEEAGSVGADTVSG RADVGGSGG 
GWAWVISGKTASALAAC^GRLGRWRARPALDWDVGYSLVSTRSVFDH^ 
AGWAGRPEAGWCGVGKPAGKTAFVFAGQGSQWLGMGSELYAAYPVFAEALDAWDELDRHLRY 
PLRDVIWGHDQDLLNTTEFAQPALFAVEVALYRLLMSWGVRPGLVLGHSVGELAAAHVAGALCLPD 

10 AAMLVAARGRLMQALPAGGAMFAVC^REDEVAPMLGHDVSIAAVNGPASWISGAHDAVSAIADRL 
RGQGRRVHRLAVSHAFHSALMEPMIAEFTAVAAELSVGLPTIPVISNVTGQLVADDFASADYWARHIR 
AWRFGDSVRSAHCAGASRFIEVGPGGGLTSLIEASLADAQIVSVPTLRKDRPEPVSVMTAAAQGFV 
SGMGLDWASVFSGYRPKRVELPTYAFQHQKFWLAPAPSVSDPTAAGQIGASDGGAELLASSGFAA 
RLAGRSADEQLAAAIEWCEHAAAVLGRDGAAGLDAGQAFADSGFNSLSAVELRNRLTAVTAVTLPA 

15 TAIFDHPTPTELAQYLITQIDGHGSSAAAAANPAERIDALTDLFLQACDAGRDADGWKMVALASNTRE 
RMSSPVRNNVSKNVALU^DGISDVWICIPTLTVLSDQREYRDIANAMTGRHSWSLTLPGFDSSDAL 
PQNADMIVETVSNAIIDWGGSCRFVLSGYSSGGVLAYALCSHLSVKHQRNPLGVALIDTYLPSQIAN 
PSMNEGFSPNDTGKGLSREVIRVARMLNRLTATRLTAAATYAAIFQAWEPGRSMAPVLNIVAKDRIAT 
VENLREERINRWRTAAAEAAYSVAEVPGDHFGMMSTSSEAIATEIHDWISGLVRGPHR 

20 

>Rv0435c - ATPase of AAA-family TB.seq 522348:524531 MW:7531 5 SEQ ID NO: 1 67 

VTHPDPARQLTLTARLNTSAVDSRRGWRLHPNAIAALGIREWDAVSLTGSRTTAAVAGLAAADTAV 

GWLLDDVTLSNAGLREGTEVIVSPVTWGARSWLSGSTU^^ 

LPRDLGPGTSTSAASRALAAAVGISWTSELLTVTGVDPDGPVSVQPNSLVTVVGAGVPAAM 
25 QVSISSPEIQIEELKGAQPQAAKLTEWLKl^DEPHLLQTLGAGTNLGVLVSGPAGVGKATLVRAVCD 
GRRLVTLDGPEIGALAAGDRVKAVASAVQAVRHEGGVLLITDADALLPAAAEPVASLILSELRTAVATA 
GWLIATSARPDQLDARLRSPELCDRELGLPLPDAATRKSLLEALLNPVPTGDLNLDEIASRTPGFWA 
DU\ALVREAALRAASRASADGRPPMLHQDDLLGALWIRPLSRSASDEVWGDWLDDVGDMAAAK 
QALTEAVLWPLQHPDTFARLGVEPPRGVLLYGPPGCGKTFW 
30 GSSEKAVRELFRRARDSAPSLVFLDELDALAPRRGQSFDSGVSDRWAALLTELDGIDPLRDWMLG 
ATNRPDLIDPALLRPGRLERLVFVEPPDAAARREILRTAGKSIPLSSDVDLDEVAAGLDGYSAADCVAL 
LREAALTAMRRSIDAANVTAADLATARETVRASLDPLQVASLRKFGTKGDLRS 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase TB.seq 524531 :525388 
35 MW:31219 SEQ ID NO:168 

MIGKPRGRRGVNLQILPSAMTVLSICAGLTAIKFALEHQPKAAMALIAAAAILDGLDGRVARILDAQSR 
MGAEIDSLADAVNFGVTPALVLWSMLSKWPVGWWVLLYAVCWLRL^RYNALQDDGTQPAYAHE 
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FFVGMPAPAGAVSM1GLLALKMQFGEGV\Mfi"SGWFLSF 

ALLAVI^CAAAAVLAPYLLIWVIMAYMCHIPFAVRSQRWLAQH^ 

YRPSMARLGLRKPGRRL 

5 >RvO440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 SEQ ID NO: 169 
MAKTIAYDEEARRGLERGLNALADAVKVTLGPKGRNNA/LEKKWGAPTITNDGVSIAKE 
GAELVKEVAKKTDDVAGDGTTTAWl^QALVR 

EVETKEQIAATAAISAGDQSIGDLIAEAMDKVGNEGVITVEESNTFGLQLELTCGMRFDKGYISGYFVT 
DPERQEAVLEDPYILLVSSKVSTVKDLLPLLEKVIGAGKPLLIIAEDVEGEALSTLWNKIRGTFKSVAVK 
10 APGFGDRRKAMLQDMAILTGGQVISEEVGLTLENADLSLLGKARKVNA/TKDETTIVEGAGDTDAIAGR 
VAQIRQEIENSDSDYDREKLQERLAKLAGGVAVIKAGAATEVELKERKHRIEDAVRNAKAAVEEGIVA 
GGGmLQAAPTLDELKLEGDEATGANIV^ALEAPLKQIAFNSGLEPGWAEKVRNLPAGHGLNAQT 
GWEDLLAAGVADPVKVTRSALQNAASIAGLFLTTEAWADKPEKEKASVPGGGDMGGMDF 

15 >Rv0482 murB TB.seq 570537:571643 MW:38522 SEQ ID NO:170 

MKRSGVGSLFAGAHIAEAVPLAPLTTLRVGPIARRVITCTSAEQWAALRHLDSAAKTGADRPLVFAG 
GSN LVI AENLTDLTWRLANSGITI DGNLVRAEAGAVFDDWVRAI EQGLGGLECLSGI PGSAG ATPVQ 
NVGAYGAEVSDTITRVRLLDRCTGEVRWVSARDLRFGYRTSVLKHADGLAVPTVVLEVEFALDPSGR 
SAPLRYGELIAALNATSGERADPQAVREAVLALRARKGMVLDPTDHDTWSVGSFFTNPWTQDVYE 

20 RLAGDAATRKDGPVPHYPAPDGVKLAAGWLVERAGFGKGYPDAGAAPCRLSTKHALALTNRGGAT 
AEDWTLARAVRDGVHDVFGITLKPEPVLIGCML 

>Rv0483 - TB.seq 571708:573060 MW:47859 SEQ ID NO:171 
WIRVLFRPVSUPVNNSSTPQSQGPISRRLALTALGFGVI^PN 
25 RPADSAADWPIAPISVEVGDGWFQRVALTNSAGKWAGAYSRDRTIYTITEPLGYDTTYTWSGSAV 
GHDGKAVPVAGKFTWAPVKTINAGFQLADGQWGIAAPVIIQFDSPISDKAAVERALTV^ 
WAWLPDEAQGARVHWRPREYYPAGTTVDVDAKLYGLPFGDGAYGAQDMSLHFQIGRRQWKAEV 
SSHRIQWTDAGVIMDFPCSYGEADLARNWRNGIHWTEKYSDFYM 

NGEFIHANPMSAGAQGNSNVTNGCINLSTENAEQYYRSAVYGDPVEVTGSSIQLSYADGDIWDWAV 
30 DWDTWVSMSALPPPAAKPAATQIPVTAPVTPSDAPTPSGTPTTTNGPGG 

>Rv0489 gpm phosphoglycerate mutase I TB.seq 578424:579170 MW:27217 SEQ ID NO:172 
MANTGSLVLLRHGESDWNALNLFTGWVDVGLTDKGQAEAVRSGELIAEHDLLPDVLYTSLLRRAITT 
AHLALDSADRLWIPVRRSWRLNERHYGALQGLDKAETKARYGEEQFMAWRRSYDTPPPPIERGSQ 
35 FSQDADPRYADIGGGPLTECLADWARFLPYFTDVIVGDLRVGKTVLIVAHGNSLRALVKHLDQMSDD 
EIVGLNIPTGIPLRYDLDSAMRPLVRGGTYLDPEAAAAGAAAVAGQGRG 
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>Rv0490 senX 3sensor histidine kinase TB.seq 579347:580576 MW:44794 SEQ ID NO:173 
VWFSALLLAGVLSALALAVGGAVGMRLTSRWEQRQRVA^^ 

THRDNA/YLNERAKELGLVRDRQLDIX2AWRAARQALGGEDVEFDLSPRKRSATGRSGLSVHGHARL 
LSEEDRRFAWFVHDQSDYARMEAARRDFVANVSHELKTPVGAMALLAEALLASADDSETVRRFAE 
5 KVLIEANRLGDMVAELIELSRLQGAERLPNMTDVDVDTIVSEAISRHKVAADNADIEVRTDAPSNLRVL 
GDQTLLVTALANLVSNAIAYSPRGSLVSISRRRRGANIEIAVTDRGIGIAPEDQERVFERFFRGDKARS 
RATGGSGLGUVIVKHVAANHDGTIRVWSKPGTGSTFTLALPALIEAYHDDERPEQAREPELRSNRSQ 
REEELSR 

1 0 >Rv0500 proC pyrroline-5-carboxylate reductase TB.seq 590081 :590965 MW:301 72 
SEQ ID NO:174 

MLFGMARIAIIGGGSIGEALLSGLLRAGRQWDLWAERMPDRANYLAQTYSVLVTSAADAVENATFV 
WAVKPADVEPVIADUVNATAAAENDSAEQVFVTWAGITIAYFESKLPAGTPWRAMPNAAALVGAG 
VTALAKGRFVTPQQLEEVSALFDAVGGVLTVPESQLDAVTAVSGSGPAYFFLLVEALVDAGVGVGLS 
15 RQVATDLAAQTMAGSAAMLLERMEQDQGGANGELMGLRVDLTASRLRAAVTSPGGTTAAALRELE 
RGGFRMAVDAAVQAAKSRSEQLRITPE 

>Rv0528 - TB.seq 618303:619889 MW:57132 SEQ ID IMO:175 

MWRSLTSMGTALVLLFLI^L^AIPGALLPQRGLNAAKVDDYLAAHPLIGPWLDELQAFDVFSSFWFTA 

20 lYVLLFVSLVGCLAPRTIEHARSLFMTPVAAPRNI^^ 

QQGDSVEVSAEKGYLREFGNLVFHFALLGLLVAVAVGKLFGYEGNVIVIADGGPGFCSASPAAFDSF 
RAGNWDGTSLHPICVRVNNFQAHYLPSGQATSFAADIDYQADPATADLIANSWRPYRLQVNHPLRV 
GGDRWLQGHGYAPTFTVTFPDGQTRTSTVQWRPDNPQTLLSAGWRIDPPAGSYPNPDERRKHQI 
AIQGLLAPTEQLDGTLLSSRFPALNAPAVAIDIYRGDTGLDSGRPQSLFTLDHRUEQGRLVKEKRVNL 

25 RAGQQVRIDQGPAAG7WRFDGAVPFVNLQVSHDPGQSVVVLVFAITMMAGLLVSLLVRRRRVWARI 
TPTTAGTVNVELGGLTRTDNSGWGAEFERLTGRLLAGFEARSPDMAEAAAGTGRDVD 

>Rv0667 rpoB [beta] subunit of RNA polymerase TB.seq 759805:763320 MW: 1 29220 
SEQIDNO:176 

30 LADSRQSKTAASPSPSRPQSSSNNSVPGAPNRVSFAKLREPLEVPGLLDVQTDSFEWLIGSPRWRE 
SAAERGDVNPVGGLEEVLYELSPIEDFSGSMSLSFSDPRFDDVKAPVDECKDKDMTYAAPLFVTAEF 
INNNTGEIKSQWFMGDFPMMTEKGTFIINGTERWVSQLVRSPGWFDETIDKSTDICrLHSVKVIPSR 
GAWLEFDVDKRDTVGVRIDRKRRQPVTVLLKALGWTSEQIVERFGFSEIMRSTLEKDNTVGTDEALL 
DIYRKLRPGEPPTKESAQTLLENLFFKEKRYDLARVGRYKVNKKLGLHVGEPITSSTLTEEDWATIEY 

35 LVRLHEGQTTMWPGGVEVPVETDDIDHFGNRRLRWGELIQNQIRVGMSRMERWRERMTTQDVE 
AITPQTLINIRPWAAIKEFFGTSQLSQFMDQNNPLSGLTHKRRLSALGPGGLSRERAGLEVRDVHPS 
HYGRMCPIETPEGPNIGLIGSLSWARVNPFGFIETPYRKWDGWSDEIWLTADEEDRHWAQANS 
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PIDADGRFVEPRVLVRRKAGEVEYVPSSEVDYMDVSPRQMVSVATAMIPFLEHDDANRALMGANMQ 
RQAVPLVRSEAPLVGTGMELRAAIDAGDVWAEESGVIEEVSADYITVMHDNGTRRTYRMRKFARSN 
HGTCANQCPIVDAGDRVEAGQVIADGPCTDDGEMALGKNLLVAIMPWEGHNYEDAIILSNRLVEEDV 
LTSIHIEEHEIDARDTKLGAEEITRDIPNISDEVLADLDERGIVRIGAEVRDGDILVGKVTPKGETELTPE 
5 ERLLRAIFGEKAREVRDTSLKVPHGESGKVIGIRVFSREDEDELPAGVNELVRVYVAQKRKISDGDKL 
AGRHGNKGVIGKILPVEDMPFLADGTPVDIILNTHGVPRRMNIGQILETHLGWCAHSGWKVDAAKGV 
PDWAARLPDELLEAQPNAIVSTPVFDGAQEAELQGLLSCTLPNRDGDVLVDADGKAMLFDGRSGEP 
FPYPVTVGYMYIMKLHHLVDDKIHARSTGPYSMITQQPLGGKAQFGGQRFGEMECWAMQAYGAAY 
TLQELLTIKSDDTVGRVKVYEAIVKGENIPEPGIPESFKVLLXELQSLCLNVEVLSSDGAAIELREGEDE 
10 DLERAAANLGINLSRNESASVEDLA 

>Rv0668 rpoC [beta]' subunit of RNA polymerase TB.seq 763368:767315 MW:146740 
SEQ ID NO: 177 

VLDVNFFDELRIGLATAEDIRQWSYGEVKKPETINYRTLKPEKDGLFCEKIFGPTRDWECYCGKYKRV 

15 RFKGIICERCGVEVTT^AKVRRERMGHIELAAPVTHIVVYFKGVPSRLGYLLDLAPKDLEKIIYFAAYVITS 
VDEEMRHNELSTLEAEMAVERKAVEDQRDGELEARAQKLEADLAELEAEGAKADARRKVRDGGER 
EMRQIRDRAQRELDRLEDIWSTFTKLAPKQLIVDENLYRELVDRYGEYFTGAMGAESIQKLIENFDIDA 
EAESLRDVIRNGKGQKKLRALKRLKWAAFQQSGNSPMGMVLDAVPVIPPELRPMVQLDGGRFATS 
DLNDLYRRVINRNNRLKRLIDLGAPEIIVNNEKRMLQESVDALFDNGRRGRPVTGPGNRPLKSLSDLL 

20 KGKQGRFRQNLLGKRVDYSGRSVIWGPQLKLHQCGLPKLMALELFKPFVMKRLVDLNHAQNIKSAK 
RMVERQRPQVWDVLEEVIAEHPVLLNRAPTLHRLGIQAFEPMLVEGKAIQLHPLVCEAFNADFDGDQ 
MAVHLPLSAEAQAEARILMLSSNNILSPASGRPLAMPRLDMVTGLYYLTTEVPGDTGEYQPASGDHP 
ETGVYSSPAEAIMAADRGVLSVRAKIKVRLTQLRPPVEIEAELFGHSGWQPGDAWMAETTLGRVMF 
NELLPLGYPFVNKQMHKKVQAAIINDUVERYPMIWAQTVDKLKDAGFYWATRSGVTVSMADVLVPP 

25 RKKEILDHYEERADKVEKQFQRGALNHDERNEALVEIWKEATDEVGQALREHYPDDNPIITIVDSGAT 
GNFTQTRTLAGMKGLVTNPKGEFIPRPVKSSFREGLTVLEYFINTHGARKGLADTALRTADSGYLTRR 
LVDVSQDVIVREHDCQTERGIWELAERAPDGTLIRDPYIETSAYARTLGTDAVDEAGNVIVERGQDL 
GDPEIDALLAAGITQVKVRSVLTCATSTGVCATCYGRSMATGKLVDIGEAVGIVAAQSIGEPGTQLTM 
RTFHQGGVGEDITGGLPRVQELFEARVPRGKAPIADVTGRVRLEDGERFYKITIVPDDGGEEWYDKI 

30 SKRQRLRVFKHEDGSERVLSDGDHVEVGQQLMEGSADPHEVLRVQGPREVQIHLVREVQEVYRAQ 
GVSIHDKHIEVIVRQMLRRVTIIDSGSTEFLPGSLIDRAEFEAENRRWAEGGEPAAGRPVLMGITKAS 
LATDSWLSAASFQETTRVLTDAAINCRSDKLNGLKENVIIGKLIPAGTGINRYRNIAVQPTEEARAAAYT 
I PSYEDQYYSPDFGAATGAAVPLDDYGYSDYR 

35 >Rv0711 atsA TB.seq 806333:808693 MW:8621 6 SEQ ID NO:178 

MAPEATEAFNGTIELDIRDSEPDWGPYAAPVAPEHSPNILYLVWDDVGIATWDCFGGLVEMPAMTRV 
AERGVRLSQFHTTALCSPTRASLLTGRNATTVGMATIEEFTDGFPNCNGRIPADTALLPEVLAEHGYN 
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TYCVGKWHLTPLEESNMASTKRHWPTSRGFERFYGFLGGETDQWYPDLVYDNHPVSPPGTPEGG 

YHLSKDIADKTIERRDAKV1APDKPWFSYVCPGAGHAPHHVFKEWADRYAGRFDMGYERYREIVLE 

RQKALGIVPPDTELSPINPYLDVPGPNGETWPLQDTVRPWDSLSDEEKKLFCRMAEVFAGFLSYTDA 

QIGRILDYLEESGQLDNTIIWISDNGASGEGGPNGSVNEGKFFNGYIDTVAESMKLFDHLGGPQTYN 

HYPIGWAMAFNTPYKLFKRYASHEGGIADPAIISWPNGIAAHGEIRDNYVNVSDITPTVYDLLGMTPP 

GWKGIPQKPMDGVSFIAALADPAADTGKTTQFYTMLGTRGIWHEGWFANTIHAATPAGWSNFNAD 

RWELFHIAADRSQCHDLAAEHPDKLEELKALWFSEAAKYNGLPLADLNLLETMTRSRPYLVSERASY 

VYYPDCADVGIGAAVEIRGRSFAVLADVTIDTTGAEGVLFKHGGAHGGHVLFVRDGRLHYVYNFLGE 

RQQLVSSSGPVPSGRHLLGVRYLRTGTVPNSHTPVGDLELFFDENLVGALTNVLTHPGTFGLAGAAI 

SVGRNGGSAVSSHYEAPFAFTGGTITQVTVDVSGRPFEDVESDLALAFSRD 

r 

>Rv0764c - lanosterol 14-demethylase cytochrome P450 TB.seq 856683:858035 MW:50879 
SEQ ID NO:179 

MSAVALPRVSGGHDEHGHLEEFRTDPIGLMQRVRDECGDVGTFQLAGKQWLLSGSHANEFFFRA 

GDDDLDQAKAYPFMTPIFGEGWFDASPERRKEMLHNAALRGEQMKGHAATIEDQVRRMIADWGE 

AGEIDLLDFFAELTIYTSSACLIGKKFRDQLDGRFAKLYHELERGTDPLAYVDPYLPIESFRRRDEARN 

GLVALVADIMNGRIANPPTDKSDRDMLDVLIAVKAETGTPRFSADEITGMFISMMFAGHHTSSGTASW 

TLIELMRHRDAYAAVIDELDELYGDGRSVSFHALRQIPQLENVLKETLRLHPPLIILMRVAKGEFEVQG 

HRIHEGDLVAASPAISNRIPEDFPDPHDFVPARYEQPRQEDLLNRWTWIPFGAGRHRCVGAAFAIMQI 

KAIFSVLLREYEFEMAQPPESYRNDHSKMWQLAQPACVRYRRRTGV 

>Rv0861c - DNA helicase TB.seq 958524:960149 MW:59773 SEQ ID NO:180 

VQSDKWLLEVDHELAGAARAAIAPFAELERAPEHVHTYRITPLALWNARAAGHDAEQW 

RYAVPQPLLVDIVDTMARYGRLQLVKNPAHGLTLVSLDRAVLEEVLRNKKIAPMLGARIDDDTVWHP 

SERGRVKQLLLKIGWPAEDLAGWDGEAHPISLHQEGWQLRDYQRLAADSFWAGGSGNAA/LPCGA 

GKTLVGAAAMAKAGATTLILWNIVAARQWKRELVARTSLTENEIGEFSGERKEIRPNTriSTYQMITR 

TKGEYRHLELFDSRDWGLIIYDEVHLLPAPVFRMTADLQSKRRLGLTATLIREDGREGDVFSLIGPKR 

YDAPWKDIEAQGWIAPAECVEVRVTMTDSERMMYATAEPEERYRICSWHTKIAWKSILAKHPDEQ 

TLVIGAYLDQLDELGAELGAPVIQGSTRTSEREALFDAFRRGEVATLWSKVANFSIDLPEAAVAVQVS 

GTFGSRQEEAQRLGRILRPKADGGGAIFYSWARDSLDAEYAAHRQRFLAEQGYGYIIRDADDLLGP 

Al 

>Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 SEQ ID NO:181 

VSRITTDQLRHAVLDRGSFVSWDSEPLAVPVADSYARELAAARAATGADESVQTGEGRVFGRRVAV 
VACEFDFLGGSIGVAAAERITAAVERATAERLPLLASPSSGGTRMQEGTVAFLQMVKIAAAIQLHNQA 
RLPYLWLRHPTTGGVFASWGSLGHLTVAEPGALIGFLGPRVYELLYGDPFPSGVQTAENLRRHGIID 
GWALDRLRPMLDRALTVLIDAPEPLPAPQTPAPVPDVPTWDSWASRRPDRPGVRQLLRHGATDR 
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VLLSGTDQGEAATTliALARFGGQPTWLGQQRAVGGGGSTVGPAALREARRGMALAAELCLPLNA. 
VIDAAGPALSAAAEQGGI^GQIAHCLAEL\ni-DTPWSILLGQGSGGPAUM^LPADRVlAALHGWLAP 
LPPEGASAIVFRDTAHAAELAAAQGIRSADLLKSGIVDTIVPEYPDAADEPIEFALRLSNAIAAEVHALR 
KIPAPERLATRLQRYRRIGLPRD 

5 

>Rv0983 - TB.seq 1099064:1 100455 MW:46454 SEQ ID NO:182 

MAKLARWGLVQEEQPSDMTNHPRYSPPPQQPGTPGYAQGQQQTYSQQFDWRYPPSPPPQPTQY 
RQPYEALGGTRPGLIPGVIPTMTPPPGMVRQRPRAGMLAIGAVTIAWSAGIGGAAASLVGFNRAPA 
GPSGGPVAASAAPSIPAANMPPGSVEQVAAKWPSWMLETDLGRQSEEGSGIILSAEGLILTNNHVI 
1 0 AAAAKPPLGSPPPKTTVTFSDGRTAPFTWGADPTSDIAWRVQGVSGLTPISLGSSSDLRVGQPVLA 
IGSPLGLEGTVTTGIVSALNRPVSTTGEAGNQN7VLDAIQTDAAINPGNSGGALVNMNAQLVGVNSAI 
ATLGADSADAQSGSIGLGFAIPVDQAKRIADEUSTGKASHASLGVQVTNDKDTLGAKIVEWAGGAA 
ANAGVPKGWVTKVDDRPINSADALVAAVRSKAPGATVALTFQDPSGGSRTVQVTLGKAEQ 

15 >Rv1008 - Similar to Ecoli protein YcfH TB.seq 1 127087:1 127878 MW:29066 SEQ ID NO:183 
LVDAHTHLDACGARDADTVRSLVERAAAAGVTAWTVADDLESARWVTRAAEWDRRVYAAVALHPT 
RADALTDAARAELERLVAHPRWAVGETGIDMYWPGRLDGCAEPHVQREAFAWHIDLAKRTGKPLM 
IHNRQADRDVLDVLRAEGAPDTVILHCFSSDAAMARTCVDAGWLLSLSGTVSFRTARELREAVPLMP 
VEQLLVETDAPYLTPHPHRGLANEPYCLPYTVRAIAELVNRRPEEVALITTSNARRAYGLGWMRQ 

20 

>Rv1 009 - lipoprotein, similar to various other MTB proteins TB.seq 1 1 28089:1 1 291 74 MW:38079 
SEQ ID NO: 184 

MLRLWGALLLVLAFAGGYAVAACKTvTLTVDGTAMRvTTMKSRVIDIVEENGFSVDDRDDLYPAAG 
VQVHDADTIVLRRSRPLQISLDGHDAKQvWTTASTVDEALAQLAMTDTAPAAASRASRVPLSGMALP 
25 WSAKWQLNDGGLVRTVHLPAPNVAGLLSAAGVPLLQSDHWPAATAPIVEGMQIQvTRNRIKKvTE 
RLPLPPNARRVEDPEMNMSREWEDPGVPGTQDvTFAVAEVNGVETGRLPVANWVTPAHEAWR 
VGTKPGTEVPPVIDGSIWDAIAGCEAGGNWAINTGNGYYGGVQFDQGTWEANGGLRYAPRADLAT 
REEQIAVAEvTRLRQGWGAWPVCAARAGAR 

30 >Rv1010 ksgA 16S rRNA dimethyltransferase TB.seq 1129150:1130100 MW:34647 
SEQ ID NO:185 

MCCTSGCALTIRLLGRTEIRRLAKELDFRPRKSLGQNFVHDANTVRRWAASGVSRSDLVLEVGPGL 
GSLTUU.LDRGATWAVEIDPLUVSRLQQWAEHSHSEVHRLTWNRDVUM.RREDUVAAPTAWANL 
PYNVAVPALLHLLVEFPSIRWTVMVQAEVAERLAAEPGSKEYGVPSVKLRFFGRVRRCGMVSPTVF 
35 WPIPRVYSGLVRIDRYETSPWPTDDAFRRRVFELVDIAFAQRRKTSRNAFVQWAGSGSESANRLLAA 
SIDPARRGETLSIDDFVRLLRRSGGSDEATSTGRDARAPDISGHASAS 
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>Rv101 1 - Similar to Ecoli protein YcbH TB.seq 1 130189:1 131 106 MW:31350 
SEQIDNO:186 

VPTGSVWRVPGKVNLYLAVGDRREDGYHELTWFH^^ 

ERNLAWQAAELMAEHVGRAPDVSIMIDKSIPVAGGMAGGSADAAAVLVAMNSLWELNVPRRDLRML 
5 AARLGSDVPFALHGGTALGTGRGEEU^TVLSRNTFHV\MJVFADSGLLTSAVYNELDRLREVG 
GEPGPVU\ALAAGDPDQLAPLLGNEMCW\AVS^ 
ASSAI DVG AQLSG AGVCRTVRVATGPVPGARWSAPTEV 

>Rv1106c - cholesterol dehydrogenase TB.seq 1232845:1233954 MW:40743 SEQ ID NO:187 
10 MLRRMGDASLTTELGRVL\n-GGAGFVGANL\m\LDRGHWVRSFDRAPSLLPAHPQLEVLQGDITD 
ADVCAAAVDGIDTIFHTAAIIELMGGASWDEYRQRSFAVNVGGTENLLHAGQRAGVQRFVYTSSNS 
WMGGQNIAGGDETLPYTDRFNDLYTETKWAERFVLAQNGVDGMLTCAIRPSGIWGNGDQTMFRK 
LFESVLKGHVKVLVGRKSARLDNSWHNLIHGFILAAAHLVPDGTAPGQAYFINDAEPINMFEFARPVL 
EACGQRWPKMRISGPAVRVVVMTGWQRLHFRFGFPAPLLEPLAVERLYLDNYFSIAKARRDLGYEPL 
1 5 FTTQQALTECLPYYVSLFEQMKNEARAEKTAATVKP 

>Rv1 110 lytB2 TB.seq 1236183:1237187 MW:36298 SEQ ID NO:188 

MVPTVDMGIPGASVSSRSVADRPNRKRVLL^EPRGYCAGVDF^VEWERALQKHGPPWVRHEIVH 
NRHWDTLAKAGAVFVEETEQVPEGAIWFSAHGVAPTVHVSASERNLQVIDATCPLVTKVHNEARR 
20 FARDDYDILLIGHEGHEEWGTAGEAPDHVQLVDGVDAVDQVWRDEDKWWLSQTTLSVDETMEIV 
GRLRRRFPKLQDPPSDDICYATQNRQVAVKAMAPECELVIWGSRNSSNSVRLVEVALGAGARAAH 
LVDWADDIDSAWLDG\mVGWSGASVPEVLVRGVLERUVECGYDIVQPVTTANETLVFALPRELRS 
PR 

25 >Rv1216c - TB.seq 1359473:1360144 MW:24863 SEQ ID NO:189 

MHIGLKIFIWGVLGLWFGALLFGPAGTFDYWQAWVFLAAFVSTTIGPTIYLARNDPAALQRRMRSGP 

LAEGRTIQKFIVIGAFLGFFAMMVLSACDHRYGWSSVPAAVCVIGDVLVMTGLGIAMLWIQNRYAAS 

WRVEAGQILASDGLYKIVRHPMYAGNWMI^GIPl^ 

SGYREYRQLVRYRLVPYVW 

30 

>Rv1223 htrA TB.seq 1365810:1367456 MW:56547 SEQ ID NO: 190 

VSHLSQRMAGLLRVHGEWSRSVDTRVDTDNAMPARFSAQIQNEDEVTSDQGNNGGPNGGGRLAP 
RPVFRPPVDPASRQAFGRPSGVQGSFVAERVRPQKYQDQSDFTPNDQLADPVLQEAFGRPFAGAE 
SLQRHPIDAGALAAEKDGAGPDEPDDPWRDPAAAAALGTPALAAPAPHGALAGSGKLGVRDVLFGG 
35 KVSYLALGILVAIALVIGGIGGVIGRKTAEWDAFTTSK\n"LSTTGNAQEPAGRFTlW 
SVSDQEGMQGSGVIVDGRGYIVTNNHVISEAANNPSQFKTTWFNDGKEV^ 

VDNVDNLWARLGDSSKVRVGDEVLAVGAPLGLRSTVTQGIVSALHRPVPLSGEGSDTDTVIDAIQTD 
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ASINHGNSGGPLIDMDAQVIGINTAGKSLSDSASGLGFAIPVNEMKLVANSLIKDGKIVHPTLGISTRSV 

SNAIASGAQVANVKAGSPAQKGGILENDVIVKVGNRAVADSDEFWAVRQL^GQDAPIEWREGRH 

VTLTVKPDPDST 

5 >Rv1224 - TB.seq 1367461:1367853 MW: 14083 SEQ ID NO: 191 

WANIGWWEMLVLVMVGLWLGPERLPGAIRWAASALRQARDYLSGVTSQLREDIGPEFDDLRGHL 
GELQKLRGMTPRAALTKHLLDGDDSLFTGDFDRPTPKKPDAAGSAGPDATEQiGAGPIPFDSDAT 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
10 SEQ ID NO:192 

MPSRLHSAVMSGTRDGDLNAAIRTALGKVIDPELRRPITELGMVKSIDTGPDGSVFIVEIYLT1AGCPKK 
SEITERVTRAVADVPGTSAVRVSLDVMSDEQRTELRKQLRGDTREPVIPFAQPDSLTRVYAVASGKG 
GVGKSTVTTVNLAAAMAVRGLSIGVLDADIHGHSIPRMMGTTDRP 

GNTPVVWRGPMLHRALQQFLADVYWGDLDVLLLDLPPGTGDVAISVAQLIPNAELLNA/^ 
15 VAERAGSIALQTRQRIVGWENMSGLTLPDGTTMQVFGEGGGRLVAERLSRAVGADVPLLGQIPLDP 
ALVAAGDSGVPLVLSSPDSAIGKELHSIADGLSTRRRGLAGMSLGLDPTRR 

>Rv1 239c corA magnesium and cobalt transport protein TB.seq 1 381 943: 1 383040 MW:41 470 
SEQ ID NO:193 

20 VFPGFDALPEVLRPVARPQPPNAHPVAQPPAQALVDCGVWCGQRLPGKYTYAAALREVREIELTG 
QEAFVWIGLHEPDENQMQDVADVFGLHPLAVEDAVHAHQRPKLERYDETLFLVLKTVNYVPHESW 
UVREIX^GEIMIFVGKDFWTVRHGEHGGLSEVRKRMDADPEHLRLGPYAVMHAIADYWDHYLE^ 
NLMETDIDSIEEVAFAPGRKLDIEPIYLLKREWELRRCVNPLSTAFQRMQTESKDLISKEVRRYLRDV 
ADHQTEAADQIASYDDMLNSLVQAAI^RVGMQQNMDMRKISAWAGIIAVPTMIAGIYGMNFHFMPEL 

25 DSRWGYPTVIGGMVLICLFLYHVFRNRNWL 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 SEQ ID NO:194 

MDTQSDYVWGTGSAGAWASRLSTDPATTVVALEAGPRDKNRFIGVPAAFSKLFRSEIDWDYLTEP 
QPELDGREIYWPRGKN^GGSSSMNAMMWVRGFASDYDEWAARAGPRWSYADVLGYFRRIENVTA 
AWHFVSGDDSG\rTGPLHISRQRSPRSVTAAWLAAARECGFAAARPNSPRPEGFCETWTQRRGAR 

30 FSTADAYLKPAMRRKNLRVLTGATATRWIDGDRAVGVEYQSDGQTRIVYARREWLCAGAVNSPQL 
LMLSGIGDRDHLAEHDIDTWHAPEVGCNLLDHL\nVLGFDVEKDSLFAAEKPGQLISYLLRRRGMLT 
SNVGEAYGFVRSRPELKLPDLELIFAPAPFYDEALVPPAGHGWFGPILVAPQSRGQITLRSADPHAK 
PVIEPRYLSDLGGVDRAAMMAGLRICARIAQARPLRDLLGSIARPRNSTELDEATLELALATCSHTLYH 
PMGTCRMGSDEASWDPQLRVRGVDGLRVADASVMPSTVRGHTHAPSVLIGEKAADLIRS 

35 >Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 MW:45522 SEQ ID NO:195 
VPGDEKPVGVAVLGLGNVGSEWRIIENSAEDLAARVGAPLVLRGIGVRRVTTDRGVPIELLTDDIEEL 
VAREDVDIWEVMGPVEPSRKAILGALERGKSNA/TANKALLATSTGELAQAAESAHVDLYFEAAVAGA 
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I PVI RPLTQSLAGDTVLRVAGI VNGTTN Yl LSAMDSTGAD YASALADASALGYAEADPTADVEGYDAA 
AKAAIUVSIAFHTRWADDWREGITK\n"PADFGSAHALGCTIKLLSICERITTDEGSQRVSARW 
PLSHPUVAVNGAFNAVWEAEAAGRLMFYGQGAGGAPTASAVTGDLVMAARNRVLGSRGPRESK^ 
AQLPVAPMGFIETRYYVSMNVADKPGVLSAVAAEFAKREVSIAEVRQEGWDEGGRRVGARI\A/VTH 
5 LATDAALSETVDALDDLDWQGVSSVI RLEGTGL 

>Rv1323 fadM acetyl-CoA C~acetyltransferase (aka thiL) TB.seq 1485860:1487026 MW:40049 
SEQ ID NO:196 

VIVAGARTPIGKLMGSLKDFSASELGAIAIKGALEKANVPASLVEYVIMGQVLTAGAGQMPARQAAVA 
10 AGIGWDVPALTINKMCLSGIDAIALADQLIRAREFDVWAGGQESMTKAPHLL^NSRSGYKYGDVW 
DHMAYDGLHDVFTDQPMGALTEQRNDVDMFTRSEQDEYAAASHQKAAAAWKDGVFADEVIPVNIP 
QRTGDPLQFTEDEGIRANTTAAALAGLKPAFRGDGTITAGSASQISDGAAAWVMNQEKAQELGLTW 
LAEIGAHGWAGPDSTLQSQPANAINKALDREGISVDQLDWEINEAFAAVALASIRELGLNPQIVNVN 
GGAIAVGHPLGMSGTRITLHAALQLARRGSGVGVAALCGAGGQGDALILRAG 

15 

>Rv1389 gmk putative guanylate kinase TB.seq 1564399:1565022 MW.22064 SEQ ID NO: 197 

VSVGEGPDTKPTARGQPAAVGRVWLSGPSAVGKSTWRCLRERIPNLHFSVSATTRAPRPGEVDG 

VDYHFIDPTRFCK3LIDQGELLEWAEIHGGLHRSGTI^ 

AVTVFLAPPSWQDLQARLIGRGTETADVIQRRLDTARIELAAQGDFDKVWNRRLE 
20 TAPGSP 

>Rv1407 fmu similar to Fmu protein TB.seq 1583099:1584469 MW:48494 SEQ ID NO:198 
MTPRSRGPRRRPLDPARRAAFETLRAVSARDAYANLVLPALLAQRGIGGRDAAFATELTYGTCRAR 
GLLDAVIGAAAERSPQAIDPVLLDLLRLGTYQLLRTRVDAHAAVSTTVEQAGIEFDSARAGFVNGVLR 
25 TIAGRDERSWVGELAPDAQNDPIGHAAFVHAHPRWIAQAFADALGAAVGELEAVLASDDERPAVHLA 
ARPGVLTAGEL^RAVRGWGRYSPFAVYLPRGDPGRLAPVRDGQALVQDEGSQLVARALTLAPVDG 
DTGRWLDLCAGPGGKTALLAGLGLQCAARVTAVEPSPHRADLVAQNTRGLPVELLRVDGRHTDLDP 
GFDRVLVDAPCTGLGALRRRPEARWRRQPADVAAUM<LQRELLSAAIALTRPGGWLYATCSPHLAE 
TVGAVADALRRHPVHALDTRPLFEPVIAGLGEGPHVQLWPHRHGTDAMFAAALRRLT 

30 

>Rv1409 ribG riboflavin biosynthesis TB.seq 1585192:1586208 MW:35367 SEQ ID NO:199 
MNVEQVKSIDEAMGU^IEHSYQVKGTTYPKPPVGAVIVDPNGRIVGAGGTEPAGGDHAEWALRRAG 
GLAAGAIWVTMEPCNHYGKTPPCVNALIEARVGTWYAVADPNGIAGGGAGRLSAAGLQVRSGVLA 
EQVAAGPLREWLHKQRTGLPHVTWKYATSIDGRSAAADGSSQWISSEAARLDLHRRRAIADAILVG 
35 GWLADDPALTARUVDGSLAPQQPLRVWGKRDIPPEARVLNDEARTMMIRTHEPMEVLRALSDRTD 
VLLEGGPTLAGAFLRAGAINRILAYVAPILLGGPVTAVDDVGVSNITNALRWQFDSVEKVGPDLLLSLV 
AR 
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>Rv1440 secGTB.seq 1617715:1618065 MW:12140 SEQ ID NO:200 

VAGVTAAVSARLKADEARRPGFYAAGSGPLPQVRGSTLPVMELALQITU 

GLSTLFGGGVQSSLSGSTWEKNLDRLTLFVTGIWLVSIIGVALLIKYR 

5 >Rv1484 inhA TB.seq 1674200:1675006 MW:28529 SEQ ID NO:201 

MTGLLDGKRILVSGIITDSSIAFHIARVAQEQGAQLVLTGFDRLRLIQRITDRLPAKAPLLELDVQNEEH 
U^SLAGRVTEAIGAGNKLDGWHSIGFMPQTGMGINPFFDAPYADVSKGIHISAYSYASMAKALLPIM 
NPGGSIVGMDFDPSRAMPAYNWMWAKSALESVNRFVAREAGKYGVRSNLVAAGPIRTLAMSAIVG 
GALGEEAGAQIQLLEEGWDQRAPIGWNMKDATFVAKWCALLSDWLPATTGDIIYADGGAHTQLL 

10 

>Rv1617 pykA pyruvate kinase TB.seq 1816187:1817602 MW:50668 SEQ ID NO:202 
VTRRGKIVCTLGPATQRDDLVRALVEAGMDVARMNFSHGDYDDHKVAYERVRVASDATGRAVGVL 
ADLQGPKIRLGRFASGATHWAEGETVRIWGACEGSHDRVSTTYKRLAQDAVAGDRVLVDDGKVAL 
WDAVEGDDWCTWEGGPVSDNKGISLPGMNVTAPALSEKDIEDLTFALNLGVDMVALSFVRSPAD 
15 VELVHEVMDRIGRRVPVIAKLEKPEAIDNLEAIVLAFDAVMVARGDLGVELPLEEVPLVQKRAIQMARE 
NAKPVIVATQMLDSMIENSRPTRAEASDVANAVLDGADALMLSGETSVGKYPLAAVRTMSRIICAVEE 
NSTAAPPLTHIPRTKRGVISYAARDIGERLDAKALVAFTQSGDWRRLARLHTPLPLLAFTAWPEVRS 
QLAMTWGTETFIVPKMQSTDGMIRQVDKSLLEU\RYKRGDLWIVAGAPPGTVGSTNLIHVHRIGEDD 
V 

20 

>Rv1630 rpsA 30S ribosomal protein S1 TB.seq 1833540:1834982 MW:53203 SEQ ID NO.203 
MPSPTVTSPQVAVNDIGSSEDFU^AIDKTIKYFNDGDIVEGTIVKVDRDEVLLDIGYKTEGVI^ 
HDVDPNEWSVGDEVEALVLTKEDKEGRLILSKKRAQYERAWGTIEALKEKDEAVKGTVIEWKGGLI 
LDIGLRGFLPASLVEMRRVRDLQPYIGKEIEAKIIELDKNRNNWLSRRAWLEQTQSEVRSEFLNNLQK 
25 GTIRKGWSSIVNFGAFVDLGGVDGLVHVSELSWKHIDHPSEWQVGDEVTVEVLDVDMDRERVSLS 
LKATQEDPWRHFARTHAIGQIVPGKVTKLVPFGAFVRVEEGIEGLVHISELAERHVEVPDQWAVGDD 
AMVKVIDIDLERRRISLSLKQANEDYTEEFDPAKYGMADSYDEQGNYIFPEGFDAETNEWLEGFEKQ 
RAEWEARYAEAERRHKMHTAQMEKFAAAEAAGRGADDQSSASSAPSEKTAGGSLASDAQLAALRE 
KLAGSA 

30 

>Rv1631 -TB.seq 1835011:1836231 MW:44669 SEQ ID NO:204 

MLRIGLTGGIGAGKSLLSTTFSQCGGIWDGDVLAREWQPGTEGI^SLVDAFGRDILLADGALDRQA 
LAAKAFRDDESRGVLNGIVHPLVARRRSEIIAAVSGDAWVEDIPLLVESGMAPLFPLVAA/VHADVELR 
VRRLVEQRGMAEADARARIAAC^SDQQRr^VADN^DNSGSPEDLVRF^RDWNTRVQPFAHNL 
35 AQRQIARAPARLVPADPSWPDQARRIVNRLKIACGHKALRVDHIGSTAVSGFPDFLAKDVIDIQVTVE 
SLDVADELAEPLLAAGYPRLEHITQDTEKTDARSWGRYDHTDSAALWHKRVHASADPGRPTNVHLR 
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VHGWPNQQFALLFVDWLAANPGAREDYLTVKCDADRRADGELARYWAKEPW 
DAVHWRP 

>Rv1706c - TB.seq 1932695:1933876 MW:39779 SEQ ID NO:205 
5 MTLDVPVNQGHVPPGSVACCLVGVTAVADGIAGHSLSNFGALPPEINSGRMYSGPGSGPLMAAAAA 
WDGLAAELSSAATGYGAAISELTNMRWWSGPASDSMVAAVLPFVGWLSTT^ 
AAFEAAFAMWPPPAIAANRTLLMTLVDTNWFGQNTPAIATTESQYAEMWAQDAAAMYGYASAAAP 
ATVLTPFAPPPQTTNATGLVGHATAVAALRGQHSWAAAIPWSDIQKYWMMFLGALATAEGFIYDSG 
GLTLNALQFVGGMLWSTALAEAGAAEAAAGAGGAAGWSAWSQLGAGPVAASATIAAKIGPMSVPP 
10 GWSAPPATPQAQWARSIPGIRSAAEAAETSVLLRGAPTPGRSRAAHMGRRYGRRLTVMADRPNVG 

>Rv1745c - similar to Q46822 ORF_0182 TB.seq 1971381:1971989 MW:22490 SEQ ID NO:206 
MTRSYRPAPPIERNAA.LNDRGDATGVADKATVHTGDTPLHLAFSSYVFDLHDQLLITRRAATKRTWP 
AVWTNSCCGHPLPGESLPGAIRRRL^ELGLTPDRVDLILPGFRYRAAMADGTVENEICPVYRVQVD 
15 QQPRPNSDEVDAIRWLSWEQFVRDVTAGVIAPVSPWCRSQLGYLTKLGPCPAQWPVADDCRLPKA 
AHGN 

>Rv1800 -TB.seq 2039451:2041415 MW:67068 SEQ ID NO:207 

MLPNFAVLPPEVNSARVFAGAGSAPMLAAAAAWDDLASELHCAAMSFGSVTSGLWGWWQGSASA 
20 AN4VDAAASYIGWLSTSAAHAEGAAGI^RAAVSVFEEALAAWHPAMVAANRAQVASLVASNLFGQN 
APAIAALESLYECMWAQDAAAMAGYYVGASAVATQLASWLQRLQSIPGAASLDARLPSSAEAPMGV 
VRAVNSAIAANAAAAQWGLVMGGSGTPIPSARWELANALYMSGSVPGVIAQALFTPQGLYPNAA/IK 
NLTFDSSVAQGAVILESAIRQQIAAGNNVTVFGYSQSATISSLVMANLAASADPPSPDELSFTLIGNPN 
NPNGGVATRFPGISFPSLGVTATGATPHNLYPTKIYTIEYDGVADFPRYPLNFVSTLNAIAGTYYVHSN 
25 YFILTPEQIDAAVPLTNTVGPTMTQYYIIRTENLPLLEPLRSVPIVGNPLANLVQPNLKVIVNLGYGDPA 
YGYSTSPPNVATPFGLFPEVSPWIADALVAGTQQGIGDFAYDVSHLELPLPADGSTMPSTAPGSGT 
PVPPLSIDSLIDDLQVANRNLANTISKVAATSYATVLPTADIANAALTIVPSYNIHLFLEGIQQALKGDPM 
GLVN AVG YPLAADVALFTAAGG LQLLI 1 1 SAG RTI AN Dl S Al VP 

30 >Rv1 844c gnd 6-phosphogluconate dehydrogenase (Gram -) TB.seq 2093732:20951 86 
MW:51548 SEQ ID NO:208 

MSSSESPAGIAQIGVTGLAVMGSNIARNFARHGYWAVHNRSVAKTDALLKEHSSDGKFVRSETIPEF 
LAALEKPRRVLIMVKAGEATDADAVINELADAMEPGDIIIDGGNALYTDTMRREKAMRERGLHFVGAG 
ISGGEEGALNGPSIMPGGPAESYQSLGPLLEEISAHVDGVPCCTHIGPDGSGHFVKMVHNGIEYSDM 
35 QLIGEAYQLMRDGLGLTAPAIADVFTEWNNGDLDSYLVEITAEVLRQTDAKTGKPLVDVIVDRAEQKG 
TGRWTVKSALDLGVPVTGIAEAVFARALSGSVGQRSAASGLASGKLGEQPADPATFTEDVRQALYA 
SKIVAYAQGFNQIQAGSAEFGWDITPGDLATIWRGGCIIRAKFLNHIKEAFDASPNLASLIVAPYFRGA 

169 



WO 01/35317 



PCTAJS00/31152 



VESAIDSWRRWSTAAQLGIPTPGFSSALSYYDALRTARLPAALTQAQRDFFGAHTYGRIDEPGKFHT 
LWSSDRTEVPV 

>Rv1900c lipJ TB.seq 2146246:2147631 MW:49685 SEQ ID NO:209 
5 VAQAPHIHRTRYAKCGDMDIAYQVLGDGPTDLLVLPGPFVPIDSIDDEPSLYRFHRRLASFSRVIRLDH 
RGVGLSSRLAAITTLGPKFWAQDAIAN^DAVGCEQATIFAPSFHAMNGLVLAADYPERVRSUWNGS 
ARPLWAPDYPVGAQVRRADPFLTVALEPDAVERGFDVLSIVAPTVAGDDVFRAVVWDLAGNRAGPP 
SIARAVSKVIAEADVRDVLGHIEAPTLILHRVGSTYIPVGHGRYLAEHIAGSRLVELPGTDTLYWVGDT 
GPMLDEIEEFITGVRGGADAERMLATIMFTDIVGSTQHAAALGDDRWRDLLDNHDTIVCHEIQRFGGR 
10 EVNTAGDGFVATFTSPSAAIACADDIVDAVAALGIEVRIGIHAGEVEVRDASHGTDVAGVAVHIGARVC 
ALAGPSEVLVSSTVRDIVAGSRHRFAERGEQELKGVPGRWRLCVLMRDDATRTR 

>Rv1967 - TB.seq 2210599:221 1624 MW:36516 SEQ ID NO:210 
MRENLGGVWRLGVFLAVCLLTAFLLIAVFGEVRFGDGKTWAEFAIW^ 
15 RISINPDAWRVQFTADNS\m.TRGTRAVIRYDNLFGDRYLALEEGAGGLAVLRPGHTIPLARTQPALD 
LDALIGGFKPLFRALNPEQVNALSEQLLHAFAGQGPTIGSIXAQSAAVTNTLADRDRLIGQVITNLNW 
LGSLGAHTDRLDQAWSLSALIHRLAQRKTDISNAVAYTNAAAGSVADLLSC^RAPL^KWRETDRVA 
GIAAADHDYLDNLLNTLPDKYQALVRQGMYGDFFAFYLCDWLKVNGKGGQPVYIKLAGQDSGRCA 
PK 

20 

>Rv1975 - TB.seq 2218050:2218712 MW:23650 SEQ ID NO:21 1 
MSRRASATCALSATTAVAIMAAPAARADDKRLNDGWANVYTVQRQAGCTND 
TLDLLNNRHLNDDTGSDGSTPQDRAHAAGFRGKVAETVAINPAVAISGIELINQWYYNPAFFAIMSDC 
ANTQIGVWSENSPDRTVWAWGQPDRPSAMPPRGAVTGPPSPVAAQENVPIDPSPDYDASDEIEY 
25 GINWLPWILRGVYPPPAMPPQ 

>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW:36591 
SEQ ID NO:212 

MTGKLVERVHAINWNRLLDAKDLQVWERLTGNFWLPEK^ 
30 LLDTAQAWGAVAMIDDAVTPHEEAVLTNMAFMESVHAKSYSSIFSTLCSTKQIDDAFDWSEQNPYL 
QRKAQI I VDYYRGDDALKRKASSVMLESFLFYSGFYLPM YWSSRGKLTNTADLI RLI I RDEAVHGYYIG 
YKCQRGLADLTDAERADHREYTCELLHTLYANEIDYAHDLYDELGWTDDVLPYMRYNANKALANLG 
YQPAFDRDTCQVNPAVRAALDPGAGENHDFFSGSGSSYVMGTHQPTTDTDWDF 

35 >Rv2092c helY helicase, Ski2 subfamily TB.seq 2349335:2352052 MW:99576 SEQ ID NO:21 3 
VTELAELDRFTAELPFSLDDFQQRACSALERGHGVLVCAPTGAGKTWGEFAV^ 
PLKALSNQKHTDLTARYGRDQIGLLTGDLSVNGNAPWVMTTEVLRNMLYADSPALQGLSYWMDE 
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VHFUtoRMRGPVWEEVILQLPDDVRWSLSAW 

LVGKRMFDLFDYRIGEAEGQPQ\^REI±RHIAHRREADRMADWQPRRRGSGRPGFYRPPGRPEVI 
AKLDAEGLLPAITF^FSRAGCDAAWQCLRSPLRLTSEEERARIAEVIDHRCGDUKDSDLAVLGYYEW 
REGLLRGLAAHHAGMLPAFRHTVEELFTAGLVKAVFATETLALGINMPARTWLERLV^ 
5 LTPGEYTQLTGRAGRRGIDVEGHAWIWHPEIEPSEVAGLASTRTFPLRSSFAPSYNMTINLVHRMGP 
QQAHRLLEQSFAQYQADRSWGLVRGIERGNRILGEIAAELGGSDAPILEYARLRARVSELERAQARA 
SRLQRRQAATDAU^RRGDIITITHGRRGGLA\A^ESARDRDDPRPL\^TEHRWAGRISSADYSGTT 
PVGSMTLPKRVEHRQPRVRRDLASALRSAAAGLVIPAARRVSEAGGFHDPELESSREQLRRHPVHT 
SPGLEDQIRQAERYLRIERDNAQLERKVAAATNSLARTFDRFVGLLTEREFIDGPATDPVVTDDGRLL 
10 ARIYSESDLLVAECLRTGAWEGLKPAELAGWSAWYETRGGDGQGAPFGADVPTPRLRQALTQTS 
RLSTTLRADEQAHRITPSREPDDGFVRVIYRWSRTGDLAAALAAADVNGSGSPLU^GDFVRWCRQV 
LDLLDQVRNAAPNPELRATAKRAIGDIRRGWAVDAG 

>Rv2101 helZ helicase, Snf2/Rad54 family TB.seq 2360238:2363276 MW:1 1 1632 
15 SEQ ID NO:214 

MLVLHGFWSNSGGMRLWAEDSDLLVKSPSQALRSARPHPFAAPADLIAGIHPGKPATAVLLLPSLRS 
APLDSPELIRLAPRPAARTDPMLI^WTVPWDLDPTAALAAFDQPAPDVRYGASVDYLAELAVFAREL 
VERGRVLPQLRRDTHGAAACWRPVLQGRDWAMTSLVSAM PPVCRAEVGGH DPHELATSALDAMV 
DAAVRAALSPMDLLPPRRGRSKRHRAVEAWLTALTCPDGRFDAEPDELDALAEALRPWDDVGIGTV 

20 GPARATFRLSEVETENEETPAGSLWRLEFLLQSTQDPSLLVPAEQAWNDDGSLRRWLDRPQELLLT 
ELGRASRIFPELVPALRTACPSGLELDADGAYRFLSGTAAVLDEAGFGVLLPSWWDRRRKLGLVLSA 
YTPVDGWGKASKFGREQLVEFRWELAVGDDPLSEEEIAALTETKSPLIRLRGQWVALDTEQMRRGL 
EFLERKPTGRKTTAEILAL^SHPDDVDTPLEWAVRADGWLGDLLAGAAAASLQPLDPPDGFTATLR 
PYQQRGI^WUVFLSSLGLGSCLADDMGLGKWQLLALETLESVQRHQDRGVGPTLLLCPMSLVGN 

25 WPQEAARFAPNLRWAHHGGARLHGEALRDHLERTDLWSTYTTATRDIDELAEYEWNRWLDEAQ 
AVKNSLSRAAKAVRRLRAAHRVALTGTPMENRLAELWSIMDFLNPGLLGSSERFRTRYAIPIERHGHT 
EPAERLRASTRPYILRRLKTDPAIIDDLPEKIEIKQYCQLTTEQASLYQAWADMMEKIENTEGIERRGN 
VLAAMAKLKQVCNHPAQLLHDRSPVGRRSGKVIRLEEILEEIUVEGDRVLCFTQFTEFAELLVPHLAAR 
FGRAARDIAYLHGGTPRKRRDEMVARFQSGDGPPIFLLSLKAGGTGLNLTAANHWHLDRWWNPAV 

30 ENC^TDRAFRIGQRRWQVRKFICTGTLEEKIDEMIEEKKAI^DLNA/TCXBEGWLTELSTRDLREVFAL 
SEGAVGE 

>Rv21 10c prcB proteasome [beta]-type subunit 2 TB.seq 2369727:2370599 MW:30274 
SEQ ID NO:215 

35 \mWPLPDRLSINSLSGTPAVDLSSFTDFLRRQAPELLPASISGGAPLAGGDAQLPHGTTIVALKYPGG 
WMAGDRRSTQGNMISGRDVRKVYITDDYTATGIAGTAAVAVEFARLYAVELEHYEKLEGVPLTFAG 
KINRLAIMVRGNLAAAMQGLLALPLLAGYDIHASDPQSAGRIVSFDAAGGWNIEEEGYQAVGSGSLFA 
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KSSMKKLYSQVTDGDSGLRVAVEALYDAADDDSATGGPDLVRGIFPTAVIIDADGAVDVPESRIAELA 
RAIIESRSGADTFGSDGGEK 

>Rv2118c - = B2126_C1_165 (83.6%) TB.seq 2377471:2378310 MW:30091 SEQ ID NO:216 
5 VSATGPFSIGERVQLTDAKGRRYTMSLTPGAEFHTHRGSIAHDAVIGLEQGSWKSSNGALFLVLRPL 
LVDYVMSMPRGPQVIYPKDAAQIVHEGDIFPGARVLEAGAGSGALTLSLLRAVGPAGQVISYEQRAD 
HAEHARRNVSGCYGQPPDNWRLWSDLADSELPDGSVDRAVLDMLAPWEVLDAVSRLLVAGGVLM 
Nm/ATVTQLSRIVEALRAKCKJV^PRAWETLQRGWNWGLAVRPQHSMRGHTAFLVATRRLAPGA 
VAPAPLGRKREGRDG 

>Rv2144c - TB.seq 2404166:2404519 MW:12028 SEQ ID NO:217 

MLIIALVLM.IGLLALVFAVVTSNQLVAWVCIGASVLGVALLIVDALRERQQGGADEADGAGETGVAEE 
ADVDYPEEAPEESQAVDAGVIGSEEPSEEASEATEESAVSADRSDDSAK 

15 >Rv2146c - TB.seq 2405667:2405954 MW:10805 SEQ ID NO:218 

LWFFQILGFALFIFWLLLIARVWEFIRSFSRDWRPTGV7WILEIIMSITDPPVKVLRRLIPQLTIGAVRF 
DLSI MVLLLVAFIGMQLAFGAAA 

>Rv2147c- TB.seq 2406119:2406841 MW:27630 SEQ ID NO:219 
20 VNSHCSHTFITDNRSPRARRGHAMSTLHKVKAYFGMAPMEDYDDEYYDDRAPSRGYARPRFDDDY 
GRYDGRDYDDARSDSRGDLRGEPADYPPPGYRGGYADEPRFRPREFDRAEMTRPRFGSWLRNST 
RGALAMDPRRMAMMFEDGHPLSKITTLRPKDYSEARTIGERFRDGSPVIMDLVSMDNADAKRLVDF 
AAGLAFALRGSFDKVATKVFLLSPADVDVSPEERRRIAETGFYAYQ 

25 >Rv2148c - TB.seq 2406841:2407614 MW:27694 SEQ ID NO:220 

MAADLSAYPDRESELTHALAAMRSRLAAAAEAAGRNVGEIELLPITKFFPATDVAILFRLGCRSVGES 
REQEASAKMAELNRLLAAAELGHSGGVHWHMVGRIQRNKAGSLARWAHTAHSVDSSRLVTALDRA 
WAALAEHRRGERLRVYVQVSLDGDGSRGGVDSTTPGAVDRICAQVQESEGLELVGLMGIPPLDWD 
PDEAFDRLQSEHNRVRAMFPHAIGLSAGMSNDLEVAVKHGSTCVRVGTALLGPRRLRSP 

30 

>Rv2150c ftsZ TB.seq 2408386:2409522 MW:38757 SEQ ID NO:221 

MTPPHNYLAVIKWGIGGGGVNAVNRMIEQGLKGVEFIAINTDAQALLMSDADVKLDVGRDSTRGLG 
AGADPEVGRKAAEDAKDEIEELLRGADMVFVTAGEGGGTGTGGAPWASIARKLGALTVGVVTRPF 
SFEGKRRSNQAENGIAALRESCDTLIVIPNDRLLQMGDAAVSLMDAFRSADEVLLNGVQGITDLITTP 
35 GLINVDFADVKGIMSGAGTALMGIGSARGEGRSLKAAEIAINSPLLEASMEGAQGVLMSIAGGSDLGL 
FEINEAASLVQDAAHPDANIIFGTVIDDSLGDEVRVTVIAAGFDVSGPGRKPVMGETGGAHRIESAKA 
GKLTSTLFEPVDAVSVPLHTNGATLSIGGDDDDVDVPPFMRR 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51146 SEQ ID NO:222 

VSTEQLPPDLRRVHMVGIGGAGMSGIARILLDRGGLVSGSDAKESRGVHALRARGALIRIGHDASSL 
DLLPGGATAWTTHAAIPKTNPELVEARRRGIPWLRPAVUVK^^ 

LQHCGLDPSFAVGGELGEAGTNAHHGSGDCFVAEADESDGSLLQYTPHVAVITNIESDHLDFYGSVE 

AYVAVFDSFVERIVPGGALWCTDDPGGAALAQRATELGIRVLRYGSVPGETMAATLVSWQQQGVG 

AVAHIRLASEI^TAQGPRVMRLSWGRHMALNALGALLAAVQIGAPADEVLDGLAGFEGVRRRFELV 

GTCGVGKASVRVFDDYAHHPTEISATLAAARMVLEQGDGGRCMWFQPHLYSRTKAFAAEFGRALN 

MDEVFVLDWGAREQPI^GVSGASVAEHVTVPMRWPDFSAVAQQVAAAASPGDVIWMGAGDVT 

LLGPEILTALRVRANRSAPGRPGVLG 

>Rv2153cmurG TB.seq 2412120:2413349 MW:41829 SEQ ID NO:223 

VKDTVSQPAGGRGATAPRPADAASPSCGSSPSADSVSNA/LAGGGTAGHVEPAMAVADALVALDPR 

VRITALGTLRGLETRLVPQRGYHLELITAVPMPRKPGGDU\RLPSRVWRAVREARDVLDDVDADVW 

GFGGYVALPAYLAARGLPLPPRRRRRIPWIHEANARAGLANRVGAHTADRVLSAVPDSGLRRAEW 

GVPVRASIAALDRAVLRAEARAHFGFPDDARVLLVFGGSQGAVSLNRAVSGAAADLAAAGVCVLHA 

HGPQNVLELRRRACX5DPPWAVPYLDRMEUVYAAADLVICRAGAMTVAEVSAVGLPAIYVPLPIGNG 

EQRLNALPWNAGGGMWADAALTPELVARQVAGLLTDPARLAAMTAAAARVGHRDAAGQVARAAL 

AVATGAGARTTT 

>Rv2154c ftsW TB.seq 2413349:2414920 MW:56306 SEQ ID NO:224 

VLTRLLRRGTSDTDGSQTRGAEPVEGQRTGPEEASNPGSARPRTRFGAWLGRPMTSFHLIIAVAALL 

TTLGLIMVLSASAVRSYDDDGSAVWIFGKQVLWTLVGLIGGWCLRMSVRFMRRIAFSGFAITIVMLV^ 

VLVPGIGKEANGSRGWFWAGFSMQPSELAKMAFAIWGAHLLAARRMERASLREMLIPLVPAAWAL 

ALIVAQPDLGQWSMGIILLGLLWYAGLPLRVFLSSLAAWVSAAILAVSAGYRSDRVRSWLNPENDP 

QDSGYQARQAKFALAQGGIFGDGLGQGVAKWNYLPNAHNDFIFAIIGEELGLVGALGLLGLFGLFAY 

TGMRIASRSADPFLRLLTATTTLWVLGQAFINIGWIGLLPWGLQLPLISAGGTSTAATLSLIGIIANAAR 

HEPEAVAALRAGRDDKVNRLLRLPLPEPYLPPRLEAFRDRKRANPQPAQTQPARKTPRTAPGQPAR 

QMGLPPRPGSPRTADPPVRRSVHHGAGQRYAGQRRTRRVRALEGQRYG 

>Rv2155c murD TB.seq 2414935:2416392 MW:49314 SEQ ID NO:225 

VLDPLGPGAPVLVAGGRVTGQAVAAVLTRFGATPWCDDDPVMLRPHAERGLPTVSSSDAVQQITG 
YALWASPGFSPATPLLAAAAAAGVPIWGDVEI^WRLDAAGCYGPPRSWLWTGTNGKTTTTSMLH 
AMLIAGGRRAVLCGNIGSAVLDVLDEPAELLAVELSSFQLHWAPSLRPEAGAVLNIAEDHLDWHATM 
AEYTAAKARVLTGGVAVAGLDDSRAAALLDGSPAQVRVGFRLGEPAARELGVRDAHLVDRAFSDDL 
TLLPVASIPVPGPVGVLDALAAAALARSVGVPAGAIADAVTSFRVGRHRAEWAVADGITYVDDSKAT 
NPHAARASVUKYPRVVWIAGGU-KGASLHAEVAAMASRLVGAVLIGRDRAAVAEALSRH 

173 



WO 01/35317 



PCT/US00/31152 



WAGEDTGMPATVEVPVACVLDVAKDDKAGETVGAAVMTAAVAAARRMAQPGDTVLLAPAGASFD 
QFTGYADRGEAFATAVRAVIR 

>Rv2156cmurXTB.seq 2416397:2417473 MW:37714 SEQIDNO:226 
5 MRQILIAVAVAVTVSILLTPVLIRLFTKQGFGHQIREDGPPSHHTKRGTPSMGGVAILAGIWAGYLGAH 
LAGLAFDGEGIGASGLLVLGLATALGGVGFIDDLIKIRRSRNLGLNKTAKTVGQITSAVLFGVLVLQFRN 
AAGLTPGSADLSWREIATVTLAPVLFVLFCWIVSAWSNAVNFTDGLDGLAAGTMAMVTAAYVLITF 
WQYRNACVTAPGLGCYNVRDPLDLALIAAATAGACIGFLWWNAAPAKIFMGDTGSLALGGVIAGLSV 
TSRTEILAWLGALFVAEITSWLQILTFRTTGRRMFRMAPFHHHFELVGWAF/TTVIIRFWLLTAITCGL 
10 GVALFYG EWLAAVG A 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 SEQ ID NO:227 

MIELTVAQIAEIVGGAVADISPQDAAHRRVTGTVEFDSRAIGPGGLFLALPGARADGHDHAASAVAAG 
AAWLAARPVGVPAI WPPVAAPNVLAGVLEHDN DGSGAAVLAALAKLATAVAAQLVAGG LTI IG ITGS 

15 SGKTSTKDLMAAVLAPLGEWAPPGSFNNELGHPVyrrVLRATRRTDYLILEMAARHHGNIAALAEIAPP 
SIGWLNVGTAHLGEFGSREVIAQTKAELPQAVPHSGAWLNADDPAVAAMAKLTAARWRVSRDNT 
GDVWAGPVSLDELARPRFTLHAHDAQAEVRLGVCGDHQVTNALCAAAVALECGASVEQVAAALTAA 
PPVSRHRMQVTTRGDGVTVIDDAYNANPDSMRAGLQALAWIAHQPEATRRSWAVLGEMAELGEDAI 
AEHDRIGRLAVRLDVSRLVWGTGRSISAMHHGAVLEGAWGSGEATADHGADRTAVNVADGDAALA 

20 LLRAELRPGDWLVKASNAAGLGAVADALVADDTCGSVRP 

>Rv2158c murE TB.seq 2419002:2420606 MW:55310 SEQ ID NO:228 

VSSI^RGISRRRTEVATQVEAAPTGLRPNAWGVRLAALADQVGAALAEGPAQRAVTEDRTVTGVTL 
RAQDySPGDLFAALTGSTTHGARHVGDAIARGAVAVLTDPAGVAEIAGRAAVPVLVHPAPRGVLGGL 

25 AATWGHPSERLWIGITGTSGKTTTTYLVEAGLRAAGRVAGLIGTIGIRVGGADLPSALTTPEAPTLQA 
MLAAMVERGVDTWMEVSSHALALGRVDGTRFAVGAFTNLSRDHLDFHPSMADYFEAKASLFDPDS 
ALRARTAWCIDDDAGRAMAARAADAITVSAADRPAHWRATDVAPTDAGGQQFTAIDPAGVGHHIGI 
RLPGRYNVANCLVALAILDTVGVSPEQAVPGLREIRVPGRLEQIDRGQGFLALVDYAHKPEALRSVLT 
TLAHPDRRLAWFGAGGDRDPGKRAPMGRIAAQLADLVWTDDNPRDEDPTAIRREILAGAAEVGGD 

30 AQWEIADRRDAIRHAVAWARPGDWLIAGKGHETGQRGGGRVRPFDDRVELAAALEALERRA 

>Rv2159c- TB.seq 2420632:2421663 MW:36377 SEQ ID NO:229 

MKFVNHIEPVAPRRAGGAVAEVYAEARREFGRLPEPLAMLSPDEGLLTAGWATLRETLLVGQVPRG 
RKEAVAAAVAASLRCPWCVDAHTTM LYAAGQTDTAAAILAGTAPAAGDPNAPYVAWAAGTGTPAGP 
35 PAPFGPDVAAEYLGTAVQFHFIARLVLVLLDETFLPGGPRAQQLMRRAGGLVFARKVRAEHRPGRST 
RRLEPRTLPDDLAWATPSEPIATAFAALSHHLDTAPHLPPPTRQWRRWGSWHGEPMPMSSRWTN 
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EHTAELPADLHAPTRLALLTGLAPHQVTDDDVAMRSLLDTDAALVGALAWAAFTAARRIGTWIGAAA 
EGQVSRQNPTG 

>Rv2163c pbpB TB.seq 2425049:2427085 MW:72506 SEQ ID NO:230 
5 VSRAAPRRASQSQSTRPARGLRRPPGAQEVGQRKRPGKTQKARQAQEATKSRPATRSDVAPAGR 
STRARRTRQWDVGTRGASFVFRHRTGNAVILVLMLVAATQLFFLQVSHAAGLRAQAAGQLKVTDV 
QPAARGSIVDRNNDRLAFTIEARALTFQPKRIRRQLEEARKKTSAAPDPQQRLRDIAQEVAGKLNNKP 
DAAAVLKKLQSDETFNAIJ^RAVDPAVASAICAKYPEVGAERQDLRQYPGGSLAANWGGIDWDGHG 
LLGLEDSLDAVI^GTDGSVTYDRGSDGWIPGSYRNRHKAVHGSTWLTLDNDIQFYVQQQVQQAK 

10 NLSGAHNVSAWLDAKTGEVLAMANDNTFDPSQDIGRQGDKQLGNPAVSSPFEPGSVNKIVAASAVI 
EHGLSSPDEVLQVPGSIQMGGVTVHDAWEHGVMPYTTTGVFGKSSNVGTLMLSQR 
RKFGLGQRTGVGLPGESAGLVPPIDQWSGSTFANLPIGQGLSMTLLQMTGMYQAIANDGVRVPPRII 
KATVAPDGSRTEEPRPDDIRWSAQTAQTVRQMLRAWQRDPMGYQQGTGPTAGVPGYQMAGKT 
GTAQQINPGCGCYFDDVYWITFAGIATADNPRYVIGIMLDNPARNSDGAPGHSAAPLFHNIAGWLMQ 

15 RENVPLSPDPGPPLVLQAT 

>Rv2165c - TB.seq 2428236:2429423 MW:42498 SEQ ID NO:231 

VQTRAPWSLPEATLAYFPNARFVSSDRDLGAGAAPGIAASRSTACQTWGGITVADPGSGPTGFGHV 
PVLAQRCFELLTPALTRYYPDGSQAVLLDATIGAGGHAERFLEGLPGLRLIGLDRDPTALDVARSRLV 
20 RFADRLTLVHTRYDCLGAALAESGYAAVGSVDGILFDLGVSSMQLDRAERGFAYATDAPLDMRMDP 
TTPLTAADIVNTYDEAALADILRRYGEERFARRIAAGIVRRRAKTPFTSTAELVALLYQAIPAPARRVGG 
HPAKRTFQALRIAVNDELESLRTAVPAALDALAIGGRIAVLAYQSLEDRIVKRVFAEAVASATPAGLPV 
ELPGHEPRFRSLTHGAERASVAEIERNPRSTPVRLRALQRVEHRAQSQQWATEKGDS 

25 >Rv2166c - TB.seq 2429428:2429856 MW:15912 SEQ ID NO:232 

MFLGTYTPKLDDKGRLTLPAKFRDAUVGGLMWKSQDHSLAVYPRAAFEQLARRASKAPRSNPEAR 

AFLRNLAAGTDEQHPDSQGRITLSADHRRYASLSKDCWIGAVDYLEIWDAQAWQNYQQIHEENFSA 

ASDEALGDIF 

>Rv2197c - TB.seq 2461505:2462146 MW:22481 SEQ ID NO:233 
30 MVS RYSAYRRGPDVI SPDVI DRILVGAC AAVWLVFTGVSVAAAVALMDLGRGFH EMAGN PHTTWVL 
YAVIWSALVIVGAIPVLLRARRMAEAEPATRPTGASVRGGRSIGSGHPAKRAVAESAPVQHADAFEV 
AAEWSSEAVDRIWLRGTNA/LTSAIGIALIAVAAATYLMAVGHDGPSWISYGLAGWTAGMPVIEW 
RQLRRWAPQSS 

>Rv2198c - TB.seq 2462149:2463045 MW:30955 SEQ ID NO:234 
35 MSGPNPPGREPDEPESEPVSDTGDERASGNHLPPVAGGGDKLPSDQTGETDAYSRAYSAPESEHV 
TGGPWPADLRLYDYDDYEESSDLDDELAAPRWPVVWGVAAIIAAVALWSVSLLWRPHTSKLATG 
DTTSSAPPVQDEITTTKPAPPPPPPAPPPTTEIPTATETQTNnVTPPPPPPPATTTAPPPA 

175 



WO 01/35317 PCT/CSOO/31152 

PPTTTTPTGPRQ\nYSWGTKAPGDIISVTWDAAGF^RTQHNVYIPWSMTVTPISQSDVGSVEASSL 
FRVSKLNCSITTSDGTVLSSNSNDGPQTSC 

>Rv2199c-TB.seq 2463234:2463650 MW:14866 SEQ ID NO:235 
5 MHIEARLFEFVAAFFVVTAVLYGVLTSMFATGGVEWAGTTALALTGGMALIVATFFRFVARRLDSRPE 
DYEGAEISDGAGELGFFSPHSWWPIMVALSGSVAAVGIALWLPWLIAAGVAFILASAAGLVFEYYVGP 
EKH 

» 

>Rv2200c ctaC TB.seq 2463661:2464749 MW:40449 SEQ ID NO:236 
. 10 WPRGPGRLQRLSQCRPQRGSGGPARGLRQLALAAM 

HLNRELWIGAVIASI^VGVIVWGLIFWSAVFHRKKNTDTELP 

VWQEKMLQIAKDPEWIDITSFQWNWKFGYQRVNFKDGTLTYDGADPERKRAMVSKPEGKDKYGE 
ELVGPVRGLNTEDRTYLNFDKVETLGTSTEIPN^VLPSGKRIEFQMASADVIHARWPEFLFKRDVMP 
NPVANNSVNVFQIEEIWGAF^GHCAEMCGTYHSMMNFEVRNAn-PNDFKAYLQQRIDGKTNAEALR 
1 5 Al NQPPLAVTTHPFDTRRGELAPQPVG 

>Rv2427c proA g-g!utamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
SEQ ID NO:237 

MTVPAPSQLDLRQEVHDAARRARVAARRUVSLPTWKDRALHAAADELUM 
20 EADTPAAMLDRLSLNPQRVDGIAAGLRQVAGLRDPVGEVLRGYTLPNGLQLRQQRVPLGWGMIYE 
GRPNNTTVDAFGLTLKSGNAALLRGSSSAAKSNEALVAVLRTALVGLELPADAVQLLSAADRATVTHLI 
QARGLVDWIPRGGAGLIEAWRDAQVPTIETGVGNCHVWHQAADLDVAERILLNSKTRRPSVCNA 
AETLLVDAAIAETALPRLLAALQHAGVTVHLDPDEADLRREYLSLDIAVAWDGVDAAIAHINEYGTGH 
TEAIVTTNLDAAQRFTEQIDAAAVMVNASTAFTDGEQFGFGAEIGISTQKLHARGPMGLPELTSTKWI 
25 AWGAGHTRPA 

>Rv2438c - similar to YHN4_YEAST P38795 TB.seq 2734793:2737006 MW:80492 
SEQ ID NO:238 

MGLLGGQSGPRVGSGPVGSIPTPVNAAICQQRGGFHGVERGYSAGDSGVLTSLGDNERTMNFYSA 
30 YQHGFVRVAACTHHTTIGDPAANAASVLDMARACHDDGAALAVFPELTLSGYSIEDVLLQDSLLDAV 
EDALLDL\n"ESADLLPVLWGAPLRHRHRIYNTAWIHRGAVLGWPKSYLPTYREFYERRQMAPGD 
GERGTIRIGGADVAFGTDLLFAASDLPGFVLHVEICEDMFVPMPPSAEAAU^GATVLANLSGSPITIGR 
AEDRRLLARSASARCLAAYWAAAGEGESTTDU\WDGQTMIWENGALLAESERFPKGVRRSVADVD 
TELLRSERLRMGTFDDNRRHHRELTESFRRIDFALDPPAGDIGLLREVERFPFVPADPQRLQQDCYE 
35 AYNIQVSGLEQRLRALDYPKWIGVSGGLDSTHALIVATHAMDREGRPRSDILAFALPGFATGEHTKN 
NAIKLARALGVTFSEIDIGDTARLMLHTIGHPYSVGEKWDWFENVQAGLRTDYLFRIANQRGGIVLG 
TGDLSELALGWSTYGVGDQMSHYNVNAGVPKTLIQHLIRVVVISAGEFGEKVGEVLQSVLDTEITPELI 
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PTGEEELQSSEAKVGPFALQDFSLFQVLRYGFRPSKJAFLAWHAWNDAERGNWPPGFPKSERPSYS 
LAEIRHWLQIFVQRF^SFSQFKRSALPNGPKVSHGGALSPRGDWRAPSDMSARIWLDQIDREVPKG 

>Rv2439c proB glutamate 5-kinase TB.seq 27371 18:2738245 MW:38789 SEQ ID NO:239 
5 MRSPHRDAIRTARGLNA/KVGTTALTTPSGMFDAGRLAGLAEAVERRMKAGSDWIVSSGAIAAGIEPL 
GLSRRPKDI^TKQAAASVGQVALVNSWSAAFARYGRWGQVLLTAHDISMRVQHTNAQRTLDRLRA 
LHAVAIVNENDTVATNEIRFGDNDRLSALVAHLVGADALVLLSDIDGLYDCDPRKTADATFIPEVSGPA 
DLDGWAGRSSHLGTGGMASIWAAALLAADAGVPVLLAPAADAATALADASVGTVFAARPARLSAR 
RFVVVRYAAEATGALTLDAGAVRAWRQRRSLLAAGITAVSGRFCGGDWELRAPDAAMVARGWAY 
10 DASELATMVG RSTSELPGELRRP WHADDLVAVSAKQAKQV 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
SEQ ID NO:240 

VPRFVDRWIHTRAGSGGNGCASVHREKFKPLGGPDGGNGGRGGSIVFWDPQVHTLLDFHFRPHL 
15 TAASGKHGMGNNRDGAAGADLEVKVPEGTWLDENGRLLADLVGAGTRFEAAAGGRGGLGNAALA 
SRVRKAPGFALLGEKGQSRDLTLELKTVADVGLVGFPSAGKSSLVSAISAAKPKIADYPFTTLVPNLG 
WSAGEHAFTVADVPGLIPGASRGRGLGLDFLRHIERCAVLVHWDCATAEPGRDPISDIDALETELA 
CYTPTLQGDAALGDLAARPRAWLNKIDVPEARELAEFVRDDIAQRGWPVFCVSTATRENLQPLIFGL 
SQMISDYNAARPVAVPRRPVIRPIPVDDSGFTVEPDGHGGFWSGARPERWIDQTNFDNDEAVGYL 
20 ADRLARLGVEEELLRLGARSGCAVTIGEMTFDWEPQTPAGEPVAMSGRGTDPRLDSNKRVGAAER 
KAARSRRREHGDG 

>Rv2441c rpmA SOS ribosomal protein L27 TB.seq 2739773:2740030 MW:8969 
SEQ ID NO:241 

25 MAHKKGASSSRNGRDSAAQRLGVKRYGGQWKAGEILVRQRGTKFHPGVNVGRGGDDTLFAKTAG 
AVEFGIKRGRKTVSIVGSTTA 

>Rv2442c rplU 50S ribosomal protein L21 TB.seq 2740048:2740359 MW:1 1152 
SEQ ID NO:242 
30 MMATYAIVKTGGKQYKVAVGDWKVEKLESEQGEK^ 
HTKGPKIRIHKFKNKTGYHKRQGHRQQLTVLKVTGIA 

>Rv2448c valS valyMRNA synthase TB.seq 2747596:2750223 MW:97822 SEQ ID NO:243 
MLPKSWDPAAMESAIYQKWLDAGYFTADPTSTKPAYSIVLPPPNVTGSLHMGHALEHTMMDALTRR 
35 KRMQGYEVLWQPGTDHAGIATQSWEQQI^VDGKTKEDLGRELFVDKVAA/DWKRESGGAIGGQMR 
RLGDGVDWSRDRFTMDEGLSRAVRTIFKRLYDAGLIYRAERLVNWSPVLQTAISDLEVNYRDVEGEL 
VSFRYGSLDDSQPHIWATTRVETMLGDTAIAVHPDDERYRHLVGTSLAHPFVDRELAIVADEHVDPE 
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FGTGAVKVTPAHDPNDFEIGVRHQLPMPSILDTKGRIVDTGTRFDGMDRFEARVAVRQALAAQGRV 
VEEKRPYLHSVGHSERSGEPIEPRLSLQWWVRVESLAKAAGDAVRNGDWIHPASMEPRWFSWVD 
DMHDWCISRQLWWGHRIPimGPDGEQVCVGPDETPPQGWEQDPDVLDTWFSSALWPFSTLGW 
PDKTAELEKFYPTSVLWGYDILFFVVVARMMMFGTFVGDDAAITLDGRRGPQVPFTDVFLHGLIRDE 
5 SGRKMSKSKGNVIDPLDVVVEMFGADALRFTLARGASPGGDLAVSEDAVRASRNFGTKLFNATRYAL 
LNGAAPAPLPSPNELTDADRWILGRLEEVRAEVDSAFDGYEFSRACESLYHFAWDEFCDWYLELAK 
TQI^QGLTHTTAVLAAGLDTLLRLLHPVIPFLTEALWLALTGRESLVSADWPEPSGISVDLVAAQRI N D 
MQKLNTTEVRRFRSDQGU^RQKVPARMHGVRDSDLSNQVAAWSU^WLTEPGPDFEPSVSLEVRL 
GPEMNRTVWELDTSGTIDVAAERRRLEKELAGAQKELASTAAKLANADFLAKAPDAVIAKIRDRQRV 
10 AQQETERITTRLAALQ 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 SEQ ID NO:244 

VTKPAADASAVLTAEDTLVLASTATPVEMELIMGWLGQQRARHPDSKFDILKLPPRNAPPAALTALVE 
QLEPGFASSPQSGEDRSIVPVRVIWLPPADRSRAGKVAALLPGRDPYHPSQRQQRRILRTDPRRAR 

15 WAGESAIWSELRQQWRDTWAEHKRDFAQFVSRRALUVLARAEYRILGPQYKSPRLVKPEM 

RFRAGLDRIPGATVEDAGKMLDELSTGWSQVSVDLVSVLGRLASRGFDPEFDYDEYQVAAMRAALE 
AHPAVLLFSHRSYIDGWVPVAMQDNRLPPVHMFGGINLSFGLMGPLMRRSGMIFIRRNIGNDPLYK 
YVLKEYVGYWEKRFNLSWSIEGTRSRTGKMLPPKLGLMSYVADAYLDGRSDDILLQGVSICFDQLH 
EITEYAAYARGAEKTPEGLRWLYNFIKAQGERNFGKIYVRFPEAVSMRQYLGAPHGELTQDPAAKRL 

20 ALQKMSFEVAWRILQATPVTATGLVSALLLTTRGTALTLDQLHHTLQDSLDYLERKQSPVSTSALRLR 
SREGVRAAADALSNGHPWRVDSGREPVWYIAPDDEHAAAFYRNSVIHAFLETSIVELAUMHAK 
GDRVAAFWAQAMRLRDLLKFDFYFADSTAFRANIAQEMAWHQDWEDHLGVGGNEIDAMLYAKRPL 
MSDAMLRVFFEAYEIVADVLRDAPPDIGPEELTELALGLGRQFVAQGRVRSSEPVSTLLFATARQVAV 
DQELIAPAADLAERRVAFRRELRNILRDFDYVEQIARNQFVACEFKARQGRDRI 

25 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 SEQ ID NO:245 
MPIPAPSPDARAWTGASQNIGAALATELAARGHHLIVTARRED^^ 

ADPQERSKLADELAARP I SI LCAN AGTATFGPI ASLDLAGEKTQVQLN AVAVH DLTLAVLPGMI ERKAG 
GILISGSAAGNSPIPYNATYAATKAFVNTFSESLRGELRGSGVHVTVLAPGPVRTELPDASEASLVEKL 
30 VPDFLWISTEHTARVSLNALERNKMRWPGLTSKAMSVASQYAPRAIVAPIVGAFYKRLGGS 

>Rv2524c fas fatty acid synthase TB.seq 2840124:2849330 MW:326226 SEQ ID NO:246 
\n"IHEHDRVSADRGGDSPHTTHALVDRLMAGEPYAVAFGGQGSAWLETLEELVSATGlETELATLVG 
EAELLLDPVTDELIWRPIGFEPLQWVRALAAEDPVPSDKHL^ 
35 DLVATPPVAMAGHSQGVI^VEALKAGGARDVELFALAQLIGAAGTLVARRRGISVLGDRPPMVSVTN 
ADPERIGRLLDEFAQDVRTVLPPVLSIRNGRRAWITGTPEQLSRFELYCRQISEKEEADRKNKVRGG 
DVFSPVFEPVQVEVGFHTPRLSDGIDIVAGWAEKAGLDVALARELADAILIRKVDWVDEITRVHAAGA 
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RWILDLGPGDILTRLTAPVIRGLGIGIVPAATRGGQRNLFTVGATPEVARAWSSYAPTWRLPDGRVK 
LSTKFTRLTGRSPILUVGMTPTWDAK^^ 

YQFNALFLDPYLWKLQVGGKRLVQKARQSGAAIDGWISAGIPDLDEAVELIDELGDIGISHWFKPGT 
IEQIRSVIRIATEVPTKP\^MHVEGGRAGGHHSWEDLDDU_U^TYSELRSRANITVCVGGGIGTPRRAA 
5 EYLSGRWAC^YGFPLMPIDGILVGTAA^TKESTTSPSVKRMLVDTQGTDQWISAGKAQGGMASSR 
SQLGADIHEIDNSASRCGRLLDEVAGDAEAVAERRDEIIAAMAKTAKPYFGDVADMTYLQWLRRYVE 
UVIGEGNSTADTASVGSPWU^TWRDRFEQMLQRAEARLHPQDFGPIQTLFTDAGLLDNPQQAIAAL 
LARYPDAETVQLH PADVPFFVTLCKTLGKPVN FVPVI DQDVRRWWRSDSLWQAHDARYDADAVCI I P 
GTASVAGITRMDEPVGELLDRFEQAAIDEVLGAGVEPKDVASRRLGRADVAGPLAWLDAPDVRWA 

10 GRTWNPVHRIADPAEWQVHDGPENPRATHSSTGARLQTHGDDVALSVPVSGTWVDIRFTLPANTV 
DGGTPVIATEDATSAMRTVI^IAAGVDSPEFLPAVANGTATLTVDWHPERVADHTGVTATFGEPLAP 
SLTNVPDALVGPCWPAVFAAIGSAVTDTGEPWEGLLSLVHLDHAARWGQLPTVPAQLTVTATAAN 
ATDTDMGRWPVSWVTGADGAVIATLEERFAILGRTGSAELADPARAGGAVSANATDTPRRRRRDV 
TITAPVDMRPFAWSGDHNPIHTDRAAALUKGLESPIVHGMWLSAAAQHAVTATDGQARPPARLVG 

15 mARFLGMVRPGDEVDFRVERVGIDQGAEIVDVAARVGSDLVMSASARLAAPKTVYAFPGQGIQHK 
GMGMEVRARSKAARKVWDTADKFTRDTLGFSVLHWRDNPTSIIASGVHYHHPDGVLYLTQFTQVA 
MATVAAAQVAEMREQGAFVEGAIACGHSVGEYTAUVCVTGIYQLEALLEMVFHRGSKMHDIVPRDEL 
GRSNYRLAAIRPSQIDLDDADVPAFVAGIAESTGEFLEIVNFNLRGSQYAIAGTVRGLEALEAEVERRR 
ELTGGRRSFILVPGIDVPFHSRVLRVGVAEFRRSLDRVMPRDADPDLIiGRYIPNLVPRLFTLDRDFIQ 

20 EIRDLVPAEPLDEIUVDYDTWLRERPREMARWFIELLAWQFASPVRWIETQDLLFIEEAAGGLGVERF 
VEIGVKSSPTVAGLATNTLKLPEYAHSTVEVLNAERDAAVLFATDTDPEPEPEEDEPVAESPAPDWS 
EAAPVAPAASSAGPRPDDLVFDAADATLALIALSAKMRIDQIEELDSIESITDGASSRRNQLLVDLGSE 
LNLGAIDGAAESDLAGLRSQVTKLARTYKPYGPVLSDAINDQLRTVLGPSGKRPGAIAERVKKTWELG 
EGWAKHVTVEVALGTREGSSVRGGAMGHLHEGALADAASVDKVIDAAVASVAARQGVSVALPSAG 

25 SGGGATIDAAALSEFTDQITGREGVLASAARLVLGQLGLDDPVNALPAAPDSELIDLVTAELGADWPR 
LVAPVFDPKKAWFDDRWASAREDLVKLWLTDEGDIDADWPRL^ERFEGAGHWATQATVVWQGKS 
L^GRQIHASLYGRIAAGAENPEPGRYGGEVAVVTGASKGSIAASWARLLDGGAWIATTSKLDEER 
LAFYRTLYRDHARYGAALWLVAANMASYSDVDALVEWIGTEQTESLGPQSIHIKDAQTPTLLFPFAAP 
RWGDLSEAGSRAEMEMKVLLWAVQRLIGGLSTIGAERDIASRLHWLPGSPNRGMFGGDGAYGEA 

30 KSALDAWSRWHAESSWAARVSLAHALIGmRGTGLMGHNDAIVAAVEEAGNmYSTDEMAALLLD 
LCDAESKVAAARSPIKADLTGGLAEANLDMAELAAKAREQMSAAAAVDEDAEAPGAIAALPSPPRGF 
TPAPPPQWDDLDVDPADLWIVGGAEIGPYGSSRTRFEMEVENELSAAGVLEUVWTTGLIRWEDDP 
QPGWYDTESGEMVDESELVQRYHDAWQRVGIREFVDDGAIDPDHASPLLVSVFLEKDFAFWSSE 
ADARAFVEFDPEHWIRPVPDSTDWQVIRKAGTEIRVPRKTKLSRWGGQIPTGFDPTVWGISADMA 

35 GSIDRU^VWNMVATVDAFLSSGFSPAEVMRWHPSLVANTQGTGMGGGTSMQTMYHGNLLGRNKP 
NDIFQEVLPNIIAAHWQSYVGSYGAMIHPVAACATAAVSVEEGVDKIRLGKAQLWAGGLDDLTLEGII 
GFGDMAATADTSMMCGRGIHDSKFSRPNDRRRLGFVEAQGGGTILLARGDLALRMGLPVLAWAFA 
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QSFGDGVHTSIPAPGLGALGAGRGGKDSPU^RALAKLGVAADDVAVISKHDTSTLANDPNETELHER 
LADALGRSEG APLFWSQKSLTGH AKGGAAVFQMMGLCQI LRDGVI PPN RSLDCVDD ELAGS AH FV 
WVRDTLRLGGKFPLKAGMLTSLGFGHVSGLVALVHPQAFIASLDPAQRADYQRRADARLLAGQRRL 
ASAIAGGAPMYQRPGDRRFDHHAPERPQEASMLLNPAARLGDGEAYIG 

5 

>Rv2555c alaS alanyl-tRNA synthase TB.seq 2873772:2876483 MW:97326 SEQ ID NO:247 
VQTHEIRKRFLDHFVKAGHTEVPSASVILDDPNLLFVNAGMVQFVPFFLGQRTPPYPTATSIQKCIRTP 
DIDEVGITTRHNTFFQMAGNFSFGDYFKRGAIELAWALLTNSLAAGGYGLDPERIWTTVYFDDDEAV 
RLWQEVAGLPAERIQRRGMADNYWSMGIPGPCGPSSEIYYDRGPEFGPAGGPIVSEDRYLEVWNL 

10 VFMQNERGEGTTKEDYQILGPLPRKNIDTGMGVERIALVLQDVHNVYETDLLRPVIDTVARVAARAYD 
VGNHEDDVRYRIIADHSRTAAILIGDGVSPGNDGRGYNA-RRLLRRVIRSAKLLGIDAAIVGDLMATVRN 
AMGPSYPELVADFERISRIAVAEETAFNRTLASGSRLFEEVASSTKKSGATVLSGSDAFTLHDTYGFPI 
ELTLEMAAETGLQVDEIGFRELMAEQRRRAKADAAARKHAHADLSAYRELVDAGATEFTGFDELRS 
QARILGIFVDGKRVPWAHGVAGGAGEGQRVELVLDRTPLYAESGGQIADEGTISGTGSSEAARAAV 

15 TDVQKIAKTLWVHRVNVESGEFVEGDWIMVDPGWRRGATQGHSGTHMVHAALRQVLGPNAVQA 
GSLNRPGYLRFDFNWQGPLTDDQRTQVEEVTNEAVQADFEVRTFTEQLDKAKAMGAIALFGESYPD 
EVRWEMGGPFSLELCGGTHVSNTAQIGPVTILGESSIGSGVRRVEAYVGLDSFRHLAKERALMAGL 
ASSLKVPSEEVPARVANLVERLRAAEKELERVRMASARAAATNAAAGAQRIGNVRLVAQRMSGGMT 
AADLRSLIGDIRGKLGSEPAWALIAEGESQ7VPYAVAANPAAQDLGIRANDLVKQLAVAVEGRGGGK 

20 ADLAQGSGKNPTGIDAALDAVRSEtAVIARVG 

>Rv2580c h isS histidy l-tRNA synthase TB.seq 2904822:2906090 MW:451 1 8 SEQ I D NO:248 
VTEFSSFSAPKGVPDYVPPDSAQFVAVRDGLLAAARQAGYSHIELPIFEDTALFARGVGESTDWSKE 
MYTFADRGDRSVTLRPEGTAGWRAVIEHGLDRGALPVKLCYAGPFFRYERPQAGRYRQLQQVGV 
25 EAIGVDDPALDAEVIAIADAGFRSLGLDGFRLEITSLGDESCRPQYRELLQEFLFGLDLDEDTRRRAGI 
NPLRVLDDKRPELRAMTASAPVLLDHLSDVAKQHFDTVLAHLDALGVPWINPRMVRGLDYYTKTAF 
EFVHDGLGAQSGIGGGGRYDGLMHQLGGQDLSGIGFGLGVDRTVLALRAEGKTAGDSARCDVFGV 
PLGEAAKLRI^VLAGRLRAAGVRVDUKYGDRGLKGAMRAAARSGARVALVAGDRDIEAGTVAVKDL 
TTGEQVSVSMDSWAEVISRLAG 

30 

>Rv2614c thrS threonyl-tRNA synthase TB.seq 2941 190:2943265 MW:77123 SEQ ID NO:249 
MSAPAQPAPGVDGGDPSQARIRVPAGTTAATAVGEAGLPRRGTPDAIWVRDADGNLRDLSVWPD 
VDTDITPVAANTDDGRSVI RHSTAHVLAQAVQELFPQAKLGI GPPITDGFYYDFDVPEPFTPEDLAALE 
KRMRQIVKEGQLFDRRWESTEQARAELANEPYKLELVDDKSGDAEIMEVGGDELTAYDNLNPRTR 
35 ERNMGDLCRGPHIPTTKHIPAFKLTRSSAAYWRGDQKNASLQRIYGTAWESQEALDRHLEFIEEAQR 
RDHRKLGVELDLFSFPDEIGSGLAVFHPKGGIVRRELEDYSRRKHTEAGYQFVNSPHITKAQLFHTSG 
HLDWYADGMFPPMHIDAEYNADGSLRKPGQDYYLKPMNCPMHCLIFRARGRSYRELPLRLFEFGTV 
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YRYEKSGWHGLTRVRGLTMDDAHIFCTRDQMRDELRSLLRFVLDLLADYGLTDFYLELSTKDPEKF 
VGAEE\AA(EEATWLAEVGAESGLELVPDPGGAAFYGPKISVQVKDALGRTWQMSTIQLDFNFPERF 
GLEYTAAOGTRHRPN^IHRALFGSIERFFGILTEHYAGAFPAVVLAPVQWGIPVADEHVAYLEEVATQ 
LKSHGVRAEVDASDDRMAKKIVHHTNHKVPFMNA-AGDRDVAAGAVSFRFGDRTQINGVARDDAVAA 
5 IVAWIADRENAVPTAELVKVAGRE 

>Rv2697c dut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:15772 SEQ ID NO:250 
VSTTLAIVRLDPGLPLPSRAHDGDAGVDLYSAEDVELAPGRRALVRTGVAVAVPFGMVGLVHPRSGL 
ATRVGLSIVNSPGTIDAGYRGEIKVALINLDPAAPIWHRGDRIAQLLVQRVELVELVEVSSFDEAGLAS 
10 TSRGDGGHGSSGGHASL 

>Rv2782c pepR protease/peptidase, M16 family (insulinase) TB.seq 3089045:3090358 MW:47074 
SEQ ID NO:251 

MPRRSPADPAAALAPRRmPGGLR\An"EFLPAVHSASVG\WVGVGSRDEGATVAGAAHFLEHLL^ 
15 KSTPTRSAVDIAQAMDAVGGELNAFTAKEHTCYYAIHVLGSDLPLAVDLVADWLNGRCAADDVEVER 
DWLEEIAMRDDDPEDALADMFUVU-FGDHPVGRPVIGSAQSVSVMTRAQL^ 
AAAGNVDHDGLVALVREHFGSRLVRGRRPVAPRKGTGRVNGSPRLTLVSRDAEQTHVSLGIRTPGR 
GWEHRWALSVLHTALGGGLSSRLFQEVRETRGLAYSVYSALDLFADSGALSWAACLPERFADVMR 
VTADVLESVARDGITEAECGIAKGSLRGGLVLGLEDSSSRMSRLGRSELNYGKHRSIEHTLRQIEQVT 
20 VEEVNAVARHLLSRRYGAAVLGPHGSKRSLPQQLRAMVG 

>Rv2783c gpsl pppGpp synthase and polyribonucleotide phosphorylase TB.seq 
3090339:3092594 MW:79736 SEQ ID NO:252 

MSAAEIDEGVFETTATIDNGSFGTRTIRFETGRL^QAAGAWAYLDDDNMLLSATTASKNPKEHFDF 
FPLTVDVEERMYAAGRIPGSFFRREGRPSTDAILTCRLIDRPLRPSFVDGLRNEIQIWTILSLDPGDLY 

25 DVLAINAASASTQLGGLPFSGPIGGVRVALIDGTWVGFPWDQIERAVFDMWAGRIVEGDVAIMMVE 
AEATENWELVEGGAQAPTESWAAGLEAAKPFIAALCTAQQELADAAGKSGKPTVDFPVFPDYGED 
VYYSVSSVATDEL^AALTIGGKAERDQRIDEIKTQWQRLADTYEGREKEVGAALRALTKKLVRQRILT 
DHFRIDGRGITDIRALSAEVAWPRAHGS^FERGETQILGNnTLDMIKMAQQIDSLGPETSKRYMHH 
YNFPPFSTGETGRVGSPKRREIGHGALAERALVPVLPSVEEFPYAIRQVSEALGSNGSTSMGSVCAS 

30 TLALLNAGVPLKAPVAGIAMGLVSDDIQVEGAVDGWERRFVTLTDILGAEDAFGDMDFKVAGTKDFV 
TALQLDTKLDGI PSQVLAGALEQAKDARLTI LEVM AEAI DRPDEMSPYAPRVTTI KVPVDKI GEVI G PK 
GKVINAITEETGAQISIEDDGWFVGATDGPSAQAAIDKINAIANPQLPWGERFLGTVVKTTDFGAFVS 
LLPGRDGLVH I SKLGKGKRI AKVEDWNVGDKLRVEI ADI DKRGKI SLI LVADEDSTAAATDAATVTS 
>Rv2793c truB tRNA pseudouridine 55 synthase TB.seq 3102364:3103257 MW:31821 

35 SEQ ID NO:253 

MSATGPGIWIDKPAGMTSHDWGRCRRIFATRRVGHAGTLDPMATGVLVIGIERATKILGLLTAAPKS 
YAATI RLGQTTSTEDAEGQVLQSVPAKHLTI EAI DAAMERLRGEIRQVPSSVSAI KVGGRRAYRLARQ 
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GRSVQLEARPI Rl DRFELLAARRRDQLI Dl DVEI DCSSGTYI RALARDLGDALGVGGH VTALRRTRVG R 
FELDQARSLDDLAERPALSLSLDEACLLMFARRDLTAAEASAAANGRSLPAVGIDGVYAACDADGRVI 
ALLRDEGSRTRSVAVLRPATMHPG 

5 >Rv2797c - TB.seq 3105619:3107304 MW:58761 SEQ ID NO:254 

VPLTVADIDRWNAQAVREVFHAASARAEVTFEASRQLAALSIFANSGGKTAEAAAHHNAGIRRDLDA 
HGNEALAVARAADRAADGIVKVQSELAALRHAAAAAELTIDALINRWPIPGLRSTEAQWARTLAKQT 
ELQAELDAIMAEANAVDEELASAVNMADGDAPIPADSGPPVGPEGLTPTQLASDANEERLREERARL 
(^HLERLQAEYDQLSVRAARDYHNGILDGDAVGRLAALTDELSAARGRLGELDAVDEALSRAPETYL 
10 TQLQIPEDPNQQVLAAVAVGNPDTAANVSVTVPGVGSTTRGALPGMVTEARDLRSEVIRQLNAAGK 
PASVATIAWMGYHPPPNPLDTGSAGDLWQTMTDGC^HAGAADLSRYLQQVRANNPSGHLTVLGHS 
YGSLTASl^LQDLDAQSAHPVNDWFYGSPGLELYSPAQLGLDHGHAWMQAPHDLITNLVAPLAPL 
HGWGLDPYLTPGFTELSSQAGFDPGGIWRDGVYAHGDYPRSFLDAAGQPQLRMSGYNLAAIAAGL 
PDNTVGPPLLPPILGGGMPAAPGPALRGGR 

15 

>Rv2864c ponA2 TB.seq 3175454:3177262 MW:63015 SEQ ID NO:255 

MWKTTLASATSGLLLLAWAMSGCTPRPQGPGPAAEKFFAALAIGDTASAAQLSDNPNEAREALNA 
AWAGLQAAHLDAQVLSAKYAEDTGWAYRFSWHLPKDRIWTYDGQLKMARDEGRWHVRWTTSGL 
HPKLGEHQTFALRADPPRRASVNEVGGTDVLVPGYLYHYSLDAGQAGRELFGTAHAWGALHPFDD 

20 TLNDPQLLAEC^SSSTQPLDL\n"LHADDSNRVAAAIGQLPGWITPQAELLPTDKHFAPAVLNDVKKA 
WDELDGKAGWRWSVNQNGVDVSVLHEVAPSPASSVSITLDRWQNAAQHAVNTRGGKAMIWIK 
PSTGEIU^AQNAGADADGPVATTGLYPPGSTFKMITAGAAVERDLATPETLLGCPGEIDIGHRTIPNY 
GGFDLGWPMSRAFASSCNTTFAELSSRLPPRGLTQAARRYGIGLDYQVDGI7TVTGSVPPTVDLAE 
RTEDGFGQGKVI^SPFGMALVAAWAAGKTPVPQLIAGRPTAVEGDATPISQKMIDALRPMMRLNA/T 

25 NGTAKEIAGCGEVFGKTGEAEFPGGSHSWFAGYRGDLAFASLIVGGGSSEYAVRMTKVMFESLPPG 
YLA 

>Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 SEQ ID NO:256 

\m/GLGMPQPPAPTLAPRRATRQLMVGNVGVGSDHPVSVQSMCTTKTHDVNSTLQQIAELTAAGC 
DIVRVACPRQEDADALAEIARHSQIPWADIHFQPRYIFAAIDAGCAAVRVNPGNIKEFDGRVGEVAKA 

30 AGAAGIPIRIGVNAGSLDKRFMEKYGKATPEALVESALWEASLFEEHGFGDIKISVKHNDPWMVAAY 
ELLAARCDYPLHLGVTEAGPAFQGTIKSAVAFGALLSRGIGDTIRVSLSAPPVEEVKVGNQVLESLNL 
RPRSLEIVSCPSCGRAQVDVYTLANEVTAGLDGLDVPLRVAVMGCWNGPGEAREADLGVASGNGK 
GQIFVRGEVIKTVPEAQIVETLIEEAMRLAAEMGEQDPGATPSGSPIVTVS 
>Rv2869c - TB.seq 3180548:3181759 MW:42835 SEQ ID NO:257 

35 MMF\n"GIVLFAL^ILISVALHECGHMWVARRTC 

FCDIAGMTPVEELDPDERDRAMYKQATWKRVAVLFAGPGMNI^ICLVLIYAIALVWGLPNLHPPTRAV 
IGETGCVAQEVSQGKLEQCTGPGPAALAGIRSGDNAA/KVGDTPVSSFDEMAAAVRKSHGSVPIWE 
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RDGTAIVTVVDIESTQRWIPNGQGGELQPATVGAIGVGAARVGPVRYGVFSAMPATFANTTGDLTVEV 
GKAU\ALPTKVGALVRAIGGGQRDPQTPISWGASHGGDWDH^^ 

LPFDGGHIAVAVFERIRNMVRSARGKVAAAPVNYLKLLPATYWLVLWGYMLLTVTADLVNPIRLFQ 
>Rv2870c - TB.seq 3181770:3183077 MW:45324 SEQ ID NO:258 
5 VATGGRWIRRRGDNEWAHNDEVTNSTDGRADGRLRWVLGSTGSIGTQALQVIADNPDRFEWG 
LAAGGAHLDTLLRQRAQTGVTN I AVADEH AAQRVGDI PYHGSDAATRLVEQTEADWLN ALVGALGL 
RPTLAALKTGARLALANKESLVAGGSLVLRAARPGQIVPVDSEHSAl^^ 

GGPFRGWSMDLEHWPEC^GAHPTWSMGPMNTLNSASLVNKGLEVIETHLLFGIPYDRIDVWHP 
QSIIHSh4\n"FIDGSTIAQASPPDMKLPISLALGWPRRVSGAAAACDFHTASSWEFEPLDTDVFPAVEL 
10 ARQAGVAGGCMTAWNAANEEAAAAFUVGRIGFPAIVGIIADVLHAADQWAVEPAWDDVLDAQRWA 
RERAQRAVSGMASVAIASTAKPGAAGRHASTLERS 

>Rv2922c smc member of Smc1/Cut3/Cut14 family TB.seq 3234189:3238055 MW:1 39610 
SEQ ID NO:259 

VGAGSRFPLVDPLPSVGARPDRLRGQPRRRTRAGGRPGSARCVPEAAAAAAGRHDTGPRRQSRR 

15 RLVAVDGADHRVQRAVIWPLVYLKSLTLKGFKSFAAPTTLRFEPGITAWGPNGSGKSNWDAI^WV 
MGEQGAKTLRGGKMEDVIFAGTSSRAPLGRAEVTVSIDNSDNALPIEYTEVSITRRMFRDGASEYEIN 
GSSCRLMDVQELLSDSGIGREMHVIVGQGKLEEILQSRPEDRRAFIEEAAGVLKHRKRKEKALRKLDT 
MAANLARLTDLTTELRRQLKPLGRQAEAAQRAAAIQADLRDARLRLAADDLVSRRAEREAVFQAEAA 
MRREHDEAAARLAVASEELAAHESAVAELSTRAESIQHTWFGLSALAERVDAWRIASERAHHLDI^ 

20 VAVSDTDPRKPEELEAEAQQVAVAEQQLLAELDAARARLDAARAELADRERRAAEADRAHLAAVRE 
EADRREGLARLAGQVETMRARVESIDESVARLSERIEDAAMRAQQTRAEFETVQGRIGELDQGEVG 
LDEHHERTVAALRLADERVAELQSAERAAERQVASLRARIDALAVGLQRKDGAAWLAHNRSGAGLF 
GSIAQLVKVRSGYEAALAAALGPAADALAVDGLTAAGSAVSALKQADGGRAVLVLSDWPAPQAPQS 
ASGEMLPSGAQWALDLVESPPQLVGAMIAMLSGVAWNDLTEAMGLVEIRPELRAVTVDGDLVGAG 

25 WVSGGSDRKLSTLEVTSEIDKARSEU\AAEAU\AQL^ 

SAMYEQLGRLGQEARAAEEEWNRLLQQRTEQEAVRTQTLDDVIQLETQLRKAQETQRVQVAQPIDR 
C^ISAAADRARGVEVEARLAVRTAEERANAVRGRADSLRRAAAAEREARVRAQQARAARLHAAAVA 
AAVADCGRLLAGRLHRAVDGASQLRDASAAQRQQRU\AMAAVRDEVNTLSARVGELTDSLHRDEL 
ANAQAALRIEQLEQMVLEQFGMAPADLITEYGPHVALPPTELEMAEFEQARERGEQVIAPAPMPFDR 

30 VTQERRAKRAERALAELGRVNPLALEEFAALEERYNFLSTQLEDVKAARKDLLGWADVDARILQVFN 
DAFVDVEREFRGVFTALFPGGEGRLRLTEPDDMLTTGIEVEARPPGKKITRLSLLSGGEKALTAVAML 
VAI FRARPSPFYI MDEVEAALDDVN LRRLLSLFEQLREQSQI 1 1 ITHQKPTMEVADALYGVTMQN DGITA 
VISQRMRGQQVDQLVTNSS 

>Rv2925c mc RNAse III TB.seq 3239829:3240548 MW:25400 SEQ ID NO:260 
35 M I RSRQPLLDALGVDLPDELLSLALTH RSYAYENGGLPTN ERLEFLGDAVLGLTITDALFH RH PDRSE 
GDLAKLRASWNTQALADVARRLCAEGLGVHVLLGRGEANTGGADKSSILADGMESLLGAIYLQHGM 
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EKAREVILRLFGPLLDAAPTLGAGLDWKTSLQELTMRGLGAPSYLVTSTGPDHDKEFTAVWVMDS 
EYGSGVGRSKKEAEQKAAAAAWKALEVLDNAMPGKTSA 

>Rv2934 ppsDTB.seq 3262245:3267725 MW: 19331 7 SEQ ID NO:261 
5 MTSLAERAAQLSPNARAALARELVRAGTTFPTDICEPVAWGIG^ 

IEQVPPDRWDADAFYDPDPSASGBMTTKWGGFVSDVDAFDADFFGITPREAVAMDPQHRMLLEVA 
WEALEHAGIPPDSLSGTRTGVMMGLSSWDYTIVNIERRADIDAYLSTGTPHCAAVGRIAYLLGLRGPA 
VAVDTACSSSLVAIHLACQSLRLRETDVALAGGVQLTLSPFTAIALSKWSALSPTGRCNSFDANADGF 
VRGEGCGWVLKRI^DAVRCX3DRVI^WRGSATNSDGRSNGMTAPNALAQRDVITSALKLADVTPD 

10 SVNWETHGTGWLGDPIEFESLAATYGLGKGQGESPCALGSVKTNIGHLEAAAGVAGFIKAVLAVQR 
GHIPRNLHFTRWNPAIDASATRLFVPTESAPWPAAAGPRRAAVSSFGLSGTNAHWVEQAPDTAVAA 
AGGMPWSALNVSGKTAARVASAAAVU^DWMSGPGAAAPU^DVAHTLNRHRARHAKFAWIA 
EAIAGLRALAAGQPRVGWDCDQHAGGPGRVFVYSGQGSQWASMGQQLLANEPAFAKAVAELDPI 
FVDQVGFSLQQTLIDGDEWGIDRIQPVLVGMQLALTELWRSYGVIPDAVIGHSMGEVSAAWAGALT 

15 PEQGLRVITTRSRLMARLSGQGAMALLELDADAAEALIAGYPQVTLAVHASPRQ7VIAGPPEQVDTVI 
AAVATQNRUVRRVEVDVASHHPIIDPILPELRSAUVDLTPQPPSIPIISTTYESAQPVADADYWSANLRN 
PVRFH(^\n"AAGVDHNTFIEISPHPVLTHALTDTLDPDGSHWMSTMNRELDQTLYFHAQLAAVGVA 
ASEHTTGRLVDLPPTPWHHQRFWVTDRSAMSELAATHPLLGAHIEMPRNGDHN^ 
UVDHKVFGQPIMPAAGFAEIALAAASEALGTAADAVAPNIVINQFEVEQMLPLDGHTPLTTQLIRGGDS 

20 QIRVEIYSRTRGGEFCRHATAKVEQSPRECAHAHPEAQGPATGTTVSPADFYALLRQTGQHHGPAF 
AALSRIVRLADGSAETEISIPDEAPRHPGYRLHPWLDAALQSVGAAIPDGEIAGSAEASYLPVSFETIR 
WRDIGRHVRCRAHLTNLDGGTGKMGRIVLINDAGHIAAEVDGIYLRRVERRAVPLPLEQKIFDAEWT 
ESPIAAVPAPEPAAETTRGSWLVLADATVDAPGKAQAKSMADDFVQQWRSPMRRVHTADIHDESAV 
LAAFAETAGDPEHPPVGVWFVGGASSRLDDEU\AARDTVWSITTWRAWGT^ 

25 GGLSVADDEPGTPAAASLKGLVRVLAFEHPDMRTTLVDLDITQDPLTALSAELRNAGSGSRHDDVIA 
WRGERRFVERLSRATIDVSKGHPWRCK3ASYWTGGLGGLGLWARWLVDRGAGRWLGGRSDPT 
' DEQCNVLAELQTRAEIVWRGDVASPGVAEKLIETARQSGGQLRGWHAAAVIEDSLVFSMSRDNLE 
RVWAPKATGALRMHEATADCELDWWLGFSSAASLLGSPGCWVYACASAWLDALVGWRRASGLPA 
AVI N WG PWSEVGVAQALVGSVLDTI SVAEGI EALDSLLAADRI RTGVARLRADRALVAFPEI RSI SYFT 

30 QWEELDSAGDLGDWGGPDALADLDPGEARRAVTERMCARIAAVMGYTDQSTVEPAVPLDKPLTEL 
GLDSLMAVRIRNGARADFGVEPPVALILQGASLHDLTADLMRQLGLNDPDPALNNADTIRDRARQRA 
AARHGAAMRRRPKPEVQGG 

>Rv2946c pksl TB.seq 3291503:3296350 MW:1 66642 SEQ ID NO:262 
35 VISARSAEALTAC^GRLMAHVQANPGLDPIDVGCSLASRSVFEHRANAA/GASREQLIAGLAGLAAGE 
PGAGVAVGQPGSVGK7VWFPGQGAQRIGMGRELYGELPVFAQAFDAVADELDRHLRLPLRDVIW 
GADADLLDSTEFAQPALFAVEVASFAVLRDWGVLPDFVMGHSVGELAAAHAAGVLTLADAAMLW^ 
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RGRLMQALPAGGAMVAVAASEDEVEPLLGEGVGIAAINAPESW1SGAQAAANAIADRFAAQGRRVH 
QLAVSHAFHSPLMEPMLEEFARVAARVQAREPQLGLVSNVTGELAGPDFGSAQYWVDHVRRPVRF 
ADSARHLQTLGATHFIEAGPGSGLTGSIEQSLAPAEAMWSMLGKDRPELASALGAAGQVFTTGVPV 
QWSAVFAGSGGRRVQLPTYAFQRRRFWETPGADGPADAAGLGLGATEHALLGAWERPDSDEWL 
5 TGRLSLADQPWLADHWNGWLFPGAGFVELVIRAGDEVGCALIEELVLAAPLVMHPGVGVQVQVW 
GAADESGHRAVSVYSRGDQSQGWLLNAEGMLGVAAAETPMDLSVWPPEGAESVDISDGYAQLAE 
RGYAYGPAFQGLVAIWRRGSELFAEWAPGEAGVAVDRMGMHPAVLDAVLHALGLAVEKTQASTET 
RLPFCWRGVSLHAGGAGRVRARFASAGADAISVDVCDATGLPVLTVRSLVTRPITAEQLRAAVTAAG 
GASDQGPLEWWSPISWSGGANGSAPPAPVSWADFCAGSDGDASVWWELESAGGQASSWGS 

10 WAATHTALEVLQSWLGADRAATLWLTHGGVGLAGEDISDLAAAAVWGMARSAQAENPGRIVLIDT 
DAAVDASVLAGVGEPQLLVRGGTVHAPRLSPAPALLALPAAESAWRLAAGGGGTLEDLVIQPCPEV 
QAPLQAGQVRVAVAAVGVNFRDWAALGMYPGQAPPLGAEGAGWLETGPEVTDLAVGDAVMGFL 
GGAGPI^WDQQLVTTRVPQGWSFAQAAAVPWFLTAWYGLADLAEIKAGESVLIHAGTGGVGMAAV 
QLARQWGVEVFVTASRGKWDTLRAMGFDDDHIGDSRTCEFEEKFLAVTEGRGVDWLDSLAGEFV 

15 DASLRLLVRGGRFLEMGKTDIRDAQEIAANYPGVQYRAFDLSEAGPARMQEMLAEVRELFDTRELH 
RLPVTTWDVRCAPAAFRFMSQARHIGKVVLTMPSAI^DRLADGTWITGATGAVGGVl^RHLVGAY 
GVRHLVLASRRGDRAEGAAELAADLTEAGAKVQWACDVADRAAVAGLFAQLSREYPPVRGVIHAA 
GVLDDAVITSLTPDRIDTVLRAKVDAAWNLHQATSDLDLSMFALCSSIAATVGSPGQGNYSAANAFLD 
GLAAHRQAAGLAGISLAWGLWEQPGGMTAHLSSRDLARMSRSGLAPMSPAEAVELFDAALAIDHPL 

20 AVATLLDRAALDARAQAGALPALFSGLARRPRRRQIDDTGDATSSKSALAQRLHGLAADEQLELLVG 
LVCLQAAAVLGRPSAEDVDPDTEFGDLGFDSLTAVELRNRLKTATGLTLPPTVIFDHPTPTAVAEYVA 
QQMSGSRPTESGDPTSQWEPAAAEVSVHA 

>Rv3014c ligA DNA ligase TB.seq 3372545:3374617 MW:75258 SEQ ID NO:263 
VSSPDADQTAPEVLRQWQALAEEVREHQFRYYVRDAPIISDAEFDELLRRLEALEEQHPELRTPDSP 

25 TQLVGGAGFATDFEPVDHLERMLSLDNAFTADELAAWAGRIHAEVGDAAHYLCELKIDGVALSLVYR 
EGRLTRASTRGDGRTGEDVTLNARTIADVPERLTPGDDYPVPEVLEVRGEVFFRLDDFQALNASLVE 
EGKAPFANPRNSAAGSLRQKDPAVTARRRLRMICHGLGHVEGFRPATLHQAYLALRAWGLPVSEHT 
TLATDLAGVRERIDYWGEHRHEVDHEIDGWVKVDEVALQRRLGSTSRAPRWAIAYKYPPEEAQTKL 
LDIRVNVGRTGRITPFAFMTPVKVAGSTVGQATLHNASEIKRKGVLIGDTWIRKAGDVIPEVLGPWE 

30 LRDGSEREFIMPTTCPECGSPLAPEKEGDADIRCPNARGCPGQLRERVFUVASRNGLDIEVLGYEAG 
VALLQAKVIADEGELFALTERDLLRTDLFRTKAGELSANGKRLLVNLDKAKAAPLWRVLVALSIRHVGP 
TAARALATEFGSLDAJAAASTDQUVAVEGVGPTIAAAVTEWFAVDWHREIVDKWRAAGVRMVDERD 
ESVPRTLAGLTIWTGSLTGFSRDDAKEAIVARGGKAAGSVSKKTNYWAGDSPGSKYDKAVELGVPI 
LDEDGFRRLLADGPASRT 

35 >Rv3025c - NifS-like protein TB.seq 3383885:3385063 MW:40948 SEQ ID NO:264 

MAYLDHAATTPMHPAAIEAMAAVQRTIGNASSLHTSGRSARRRIEEARELIADKLGARPSEVIFTAGG 
TESDNLAVKGIYWARRDAEPHRRRIVTTEVEHHAVLDSVNWLVEHEGAHVTWLPTAADGSVSATAL 
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REALQSHDDVALVSVMWANNEVGTILPIAEMSWAMEFGVPMHSDAIQAVGQLPLDFGASGLSAMS 
VAGHKFGGPPGVGALLLRRDVTCVPLMHGGGQERDIRSGTPDVASAVGMATAAQIAVDGLEENSAR 
LRLLRDRLVEGVLAEIDDVCLNGADDPMRLAGNAHFTFRGCEGDALLMLLDANGIECSTGSACTAGV 
AQPSHVLIAMGVDAASARGSLRLSLGHTSVEADVDAALEVLPGAVARARRAALAAAGASR 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW:1 19420 
SEQ ID NO:265 

MTDVDPHATRRDLVPNIPAELLEAGFDNVEEIGRGGFGVVYRCVQPSLDRAVAVKVLSTDLDRDNLE 

RFLREQF^GRLSGHPHI\mn.QVGVU\GGRPFIVMPYHAKNSLETLIRRHGPLDWRETLSIGVKL^ 

GALEAAHRVGTLHRDVKPGNILLTDYGEPQLTDFGIARIAGGFETATGVIAGSPAFTAPEVLEGASPTP 

ASDVYSLGATLFCALTGHAAYERRSGERVIAQFLRITSQPIPDLRKQGLPADVAAAIERAMARHPADR 

PATAADVGEELRDVQRRNGVSVDEMPLPVELGVERRRSPEAHAAHRHTGGGTPTVPTPPTPATKY 

RPSVPTGSLVTRSRLTDILRAGGRRRLILIHAPSGFGKSTLAAQWREELSRDGAAVAWLTIDNDDNNE 

VWFLSHLLESIRRVRPTUKESLGHVLEEHGDDAGRYVLTSLIDEIHENDDRIAWIDDWHRVSDSRTQ 

AALGFLLDNGCHHLQLIVTSWSRAGLPVGRLRIGDELAEIDSAALRFDTDEAAALLNDAGGLRLPRAD 

VQALTTSTDGWAAALRLAALSLRGGGDATQLLRGLSGASDVIHEFLSENVLDTLEPELREFLLVASVT 

ERTCGGLASALAGITNGRAMLEEAEHRGLFLQRTEDDPNWFRFHQMFADFLHRRLERGGSHRVAEL 

HRRASAWFAENGYLHEAVDHALAAGDPARAVDLVEQDETNLPEQSKMTTLI^IVQKLPTSMWSRA 

RLQLAIAWANILLQRPAPATGALNRFETALGRAELPEATQADLRAEADVLRAVAEVFADRVERVDDLL 

AEAMSRPDTLPPRVPGTAGNTAALAAICRFEFAEVYPLLDWAAPYQEMMGPFGTVYAQCLRGMAAR 

NRLDIVAALQNFRTAFEVGTAVGAHSHAARLAGSLLAELLYETGDLAGAGRLMDESYLLGSEGGAVD 

YU^RWIGARVKAAQGDHEGAADRLSTGGDTAVQLGLPRLAARINNERIRLGIALPAAVAADLLAPR 

TIPRDNGIATMTAELDEDSAVRLLSAGDSADRDC^CQRAGAL^AAIDGTRRPLAALQAQILHIETLAAT 

GRESDARNELAPVATKCAELGLSRLLVDAGLA 

>Rv3106 fprA adrenodoxin and NADPH fenedoxin reductase TB.seq 3474004:3475371 
MW.49342 SEQ ID NO:266 

MRPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIKSISKQFE 

KTAEDPRFRFFGNNAA/GEHVQPGELSERYDAVIYAVGAQSDRMLNIPGEDLPGSIAAVDFVGWYNA 

HPHFEQVSPDLSGARAWIGNGNVALDVARILLTDPDVLARTDIADHALESLRPRGIQEWIVGRRGPL 

QAAFTTLELREUUDLDGVDNA/IDPAELDGITDEDAAAVGKVCKQNIKVLRGYADREPRPGHRRM 

FLTSPIEIKGKRKVERIVLGRNELVSDGSGRVAAKDTGEREELPAQLWRSVGYRGVPTPGLPFDDQ 

SGTIPNVGGRINGSPNEYWGWIKRGPTGVIGTNKKDAQDTVDTLIKNLGNAKEGAECKSFPEDHAD 

QVADWIJWRQPKLVTSAHWQVIDAFERAAGEPHGRPRVKLASLAELLRIGLG 

>Rv3235 - TB.seq 3611296:3611934 MW:22659 SEQ ID NO:267 

MMASNQTAAQHSSATLQQAPRSIDDAGGCPLTISPIANSPGDTFAVTPWEYEPPPRNIPPCGQSSH 
AARRPHTPQLARRQPIRPSGRAPAAVTSTAKSPRLRQAGTFADAALRRVLEVIDRRRPVGQLRPLLA 
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PGLVDSVLAVSRTAAGHQQGAAMLRRIRLTPAGPDTADTAAEVFGTYSRGDRIHAIACRVEQRPAGN 
ETRWLMVALHIG 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
SEQ ID NO:268 

5 VELLRGALRTYAWGSRTAIAEFTGRPVPAAHPEAELWFGAHPGDPAWLQTPHGQTSLLEALVADPE 
GQLGSASRARFGDVLPFLVKVLAADEPLSLQAHPSAEQAVEGYLREERMGIPVSSPVRNYRDTSHK 
PELLVALQPFEAUVGFREAARTTELLRALAVSDLDPFIDLLSEGSDADGLRALFTTWITAPQPDIDVLV 
PAVLDGAIQWSSGATEFGAEAKTVLELGERYPGDAGVLAALLLNRISUVPGEAIFLPAGNLHAYVRG 
FGVEVMANSDNVLRGGLTPKHVDVPELLRVLDFAPTPKARLRPPIRREGLGLVFETPTDEFAATLLVL 
10 DGDHLGHEVDASSGHDGPQILLCTEGSATVHGKCGSLTLQRGTAAWVAADDGPIRLTAGQPAKLFR 
ATVGL 

>Rv3264c rmlA2 glucose-1 -phosphate thymidyltransferase TB.seq 3644897:3645973 MW:37840 
SEQ ID NO:269 

LATHQVDANA^VGGKGTRLRPLTLSAPKPMLPTAGLPFLTHLLSRIAAAGIEHVILGTSYKPAVFEAEF 
15 GDGSALGLQIEYVTEEHPLGTGGGIANVAGKLRNDTAMVFNGDVLSGADI^QLLDFHRSNRADVTL 
QLVRVGDPRAFGCVPTDEEDRWAFLEKTEDPPTDQINAGCYVFERNVIDRIPQGREVSVEREVFPA 
LLADGDCKIYGYVDASYWRDMGTPEDFVRGSADLVRGIAPSPALRGHRGEQLVHDGAAVSPGALLI 
GGTWGRGAEIGPGTRLDGAVIFDGVRVEAGCVIERSIIGFGARIGPRALIRDGVIGDGADIGARCELL 
SGARVWPGVFLPDGGI RYSSDV 

20 

>Rv3368c - TB.seq 3780334:3780975 MW:23734 SEQ ID NO:270 

MTLNLSVDEVLTTTRSVRKRLDFDKPVPRDVLMECLELALQAPTGSNSQGWQWVFVEDAAKKKAIA 
DWLANARGYLSGPAPEYPDGDTRGERMGRVRDSATYLAEHMHRAPVLLIPCLKGREDESAVGGVS 
FWASLFPANAA/SFCLALRSRGLGSCWTTLHLLDNGEHKVADVLGIPYDEYSQGGLLPIAYTQGIDFRP 
25 AKRLPAESVTHWNGW 

>Rv3382c lytB1 TB.seq 3796447:3797433 MW:34667 SEQ ID NO:271 
MAEVFVGPVAQGYASGEVWLI^SPRSFCAGVERAIEWKRVLDVAEGPVWRKQIVHNT^ 
DRGAVFVEDLDEIPDPPPPGAWVFSAHGVSPAVRAGADERGLQWDATCPLVAKVHAEAARFAAR 
30 GDTWFIGHAGHEETEGTLGVAPRSTLLVQTPADVAALNLPEGTQLSYLTQTTLALDETADN^DALR^ 
RFPTLGQPPSEDICYATTNRQRALQSMVGECDWLVIGSCNSSNSRRLVELAQRSGTPAYLIDGPDDI 
EPEWLSSVSTIGVTAGASAPPRLVGQVIDALRGYASITWERSIATETVRFGLPKQVRAQ 

>Rv3418c groES 10 kD chaperone TB.seq 3836985:3837284 MW:10773 SEQ ID NO:272 
35 VAKVNIKPLEDKILVQANEAETTTASGLVIPDTAKEKPQEGTWAVGPGRWDEDGEKRIPLDVAEGDT 
VIYSKYGGTEIKYNGEEYLILSARDVLAWSK 
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>Rv3423c air TB.seq 3840193:3841416 MW:43357 SEQ ID NO:273 
VKRFWENVGKPNDTTDGRGTTSLAMTPIS 

ADGYGHGATRVAQTALGAGAAELGVATVDEAUM-RADGITAPVI^WLHPPGIDFGPALLADVQVAVS 
SLRQLDELLHAVRRTGRTATVWKVDTGLNRNGVGPAQFPAMLTALRQAMAEDAVRLRGLMSHMV 
5 YADKPDDSINDVQAQRFTAFLAQAREQGVRFEVAHLSNSSATMARPDLTFDLVRPGIAVYGLSPVPA 
LGDMGLVPAMTVKCAVALVKSIRAGEGVSYGHTWIAPRDTNl^LPIGYADGVFRSLGGRLEVLINGR 
RCPGVGRICMCX3FMVDLGPGPLDVAEGDEAILFGPGIRGEPTAQDWADLVGTIHYEWTSPRGRITR 
TYREAENR 

10 >Rv3490 otsA [alpha],-trehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 
SEQ ID NO:274 

MAPSGGQEAQICDSETFGDSDFVWANRLPVDLERLPDGSTTWKRSPGGLVTALEPVLRRRRGAW 
VGWPGVNDDGAEPDLHVLDGPIIQDELELHPVRLSTTDIAQYYEGFSNATLWPLYHDVIVKPLYHRE 
VVWDRWDVNQRFAEAASRAAAHGATVVVVQDYQLQLVPKMLRMLRPDLTIGFFLHIPFPPVELFMQ 
15 MPWRTEIIQGLLGADLVGFHLPGGAQNFLILSRRLVGTDTSRGTVGVRSRFGAAVLGSRTIRVGAFPI 
SVDSGALDHAARDRNIRRRAREIRTELGNPRKILLGVDRLDYTKGIDVRLKAFSELLAEGRVKRDDTV 
WQUKTPSRERVESYQTLRNDIERQVGHINGEYGEVGHPWHYLHRPAPRDELIAFFVASDVMLVTP 
LRDGMNLVAKEYVACRSDLGGALVLSEFTGAAAELRHAYLVNPHDLEGVKDGIEEALNQTEEAGRR 
RMRSLRRQVLAHDVDRWAQSFLDALAGAHPRGQG 

20 

>Rv3598c lysS lysyl-tRNA synthase TB.seq 4041423:4042937 MW:55678 SEQ ID NO:275 
VSAADTAEDLPEQFRIRRDKRARLLAQGRDPYPVAVPRTHTLAEVRAAHPDLPIDTATEDIVGVAGRV 
IFARNSGKLCFATLQDGDGTQLQVMISLDKVGQAALDAWKADVDLGDIVYVHGAVISSRRGELSVLA 
DCWRIAAKSLRPLPVAHKEMSEESRVRQRYVDLIVRPEARAVARLRIAWRAIRTALQRRGFLEVETP 
25 VLQTLAGGAAARPFATHSNALDIDLYLRIAPELFLKRCIVGGFDKVFELNRVFRNEGADSTHSPEFSM 
LETYQTYGTYDDSAVVTRELIQEVADEAIGTRQLPLPDGSWDIDGEWATIQMYPSLSVALGEEITPQT 
TVDRLRGIADSLGLEKDPAIHDNRGFGHGKLIEELWERTVGKSLSAPTFVKDFPVQTTPLTRQHRSIP 
GVTEKWDLYLRGIELATGYSELSDPWQRERFADQARAAAAGDDEAMVLDEDFLAALEYGMPPCTG 
TGMGIDRLLMSLTGLSIRETVLFPIVRPHSN 

30 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
SEQ ID NO:276 

VLLAIDVRNTHTWGLLSGMKEHAKWQQWRIRTESEVTADELALTIDGLIGEDSERLTGTAALSTVPS 
VLHEVRIMLDQYWPSVPHVLIEPGVRTGIPLLVDNPKEVGADRIVNCLAAYDRFRKAAIWDFGSSICV 
35 DWSAKGEFLGGAIAPGVQVSSDAAAARSAALRRVELARPRSWGKNTVECMQAGAVFGFAGLVDG 
LVGRIREDVSGFSVDHDVAIVATGHTAPLLLPELHTVDHYDQHLTLQGLRLVFERNLEVQRGRLKTAR 
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>Rv3606c folK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW:20732 SEQ ID NO:277 

MTRWLSVGSNLGDRLARLRSVADGLGDALIAASPIYEADPWGGVEQGQFLNAVLIADDPTCEPREW 
LRF^QEFERAAGRVRGQRWGPRNLDVDLIACYQTSATEALVEVTARENHLTLPHPLAHLRAFVLIPW 
5 IAVDPTAQLTVAGCPRPVTRLLAELEPADRDSVRLFRPSFDLNSRHPVSRAPES 

>Rv3607c folX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14553 
MADRIELRGLWHGRHGWDHERVAGQRFVID\m^ 

PPRKLIETVGAEIADHVMDDQRVHAVEVAVHKPQAPIPQTFDDVAWIRRSRRGGRGWWPAGGAV 

10 >Rv3608c folP dihydropteroate synthase TB.seq 4049138:4049977 MW:28812 SEQ ID NO:278 
VSPAPVQVMGVLNVTDDSFSDGGCYLDLDDAVKHGLAMAAAGAGIVDVGGESSRPGATRVDPAVE 
TSRVIPWKELAAQGIWSIDTMRADVARAALQNGAQMVNDVSGGRADPAMGPLUVEADVPWVLMH 
WRAVSADTPHVPVRYGNWAEVRADLLASVADAVAAGVDPARLVLDPGLGFAKTAQHNWAILHALP 
ELVATGIPVLVGASRKRFLGALl^GPDGVMRPTDGRDTATAVISAL^ALHGAWGVRVHDVRASVDAI 

15 KWEAWMGAERIERDG 

>Rv3609c folE GTP cyclohydrolase I TB.seq 4049977:4050582 MW:22395 SEQ ID NO:279 
MSQLDSRSASARIRVFDQQRAEAAVRELLYAIGEDPDRDGLVATPSRVARSYREMFAGLYTDPDSVL 
NTMFDEDHDELVLVKEIPMYSTCEHHLVAFHGVAHVGYIPGDDGRVTGLSKIARLVDLYAKRPQVQE 
RLTSQIADALMKKLDPRGVIWIEAEHLCMAMRGVRKPGSVTTTSAVRGLFKTNAASRAEALDLILRK 

20 >Rv361 0c ftsH inner membrane protein, chaperone TB.seq 4050601 :4052880 MW:81 987 

MNRKNVTRTITAIAVWLLGWSFFYFSDDTRGYKPVDTSVAITQINGDNVKSAQIDDREQQLRLILKKG 
NNETDGSEKVITKYPTGYAVDLFNALSAKNAKVSTWNQGSILGELLVYVLPLLLLVGLFVMFSRMQG 
GARMGFGFGKSRAKQLSKDMPKTTFADVAGVDEAVEELYEIKDFLQNPSRYQALGAKIPKGVLLYGP 
PGTGKTLLARAVAGEAGVPFFTISGSDFVEMFVGVGASRVRDLFEQAKQNSPCIIFVDEIDAVGRQR 

25 GAGLGGGHDEREQTLNQLLVEMDGFGDRAGVILIAATNRPDILDPALLRPGRFDRQIPVSNPDLAGR 
RAN^RVHSKGKPMAADADLDGLAKRWGMTGADLANVINEAALLTARENGTVITGPALEEAV 
GPRRKGRIISEQEKKITAYHEGGHTLAAWAMPDIEPIYK\n*ILARGRTGGHAVAVPEEDKGLRTRSEMI 
AQLVFAMGGRAAEELVFREPTTGAVSDIEQATKIARSMVTEFGMSSKLGAVKYGSEHGDPFLGRTM 
GTQPDYSH EVAREI DEEVRKLI EAAHTEAWEI LTEYRDVL DTLAGELLEKETLH RPELESI FADVEKRP 

30 RLTMFDDFGGRIPSDKPPIKTPGELAIERGEPWPQPVPEPAFKAAIAQATQAAEAARSDAGQTGHGA 
NGSPAGTHRSGDRQYGSTQPDYGAPAGWHAPGWPPRSSHRPSYSGEPAPTYPGQPYPTGQADP 
GSDESSAEQDDEVSRTKPAHG 

>Rv3671c - TB.seq 41 12322:41 13512 MW:40722 SEQ ID NO:280 

MTPSQWLDIAVLAVAFIAAISGWRAGALGSMLSFGGVLLGATAGVLLAPHIVSQISAPRAKLFAALFLIL 
35 ALWVGEVAGWLGRAVRGAIRNRPIRLIDSVIGVGVQLW 

SRVUVRVNEAAPTWLKWPKRLSALLNTSGLPANA-EPFSRTPVIPVASPDPALVNNPWAATEPSWKI 
RSLAPRCQIWLEGTGFVISPDR\^TNAHWAGSNN\nVYAGDKPFEATWSYDPSVDVAILAVPHLP 
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PPPLVFAAEPAKTGAD\AAA.GYPGGGNFTATPARIREAIRLSGPDIYGDPEPVTRDVYTIRADVEQGD 

SGGPLIDLNGQVLGWFGAA1DDAETGFVLTAGEVAGQLAKIGATQPVGTGACVS 

>Rv3682 ponA2 TB.seq 4121913:4124342 MW:84637 SEQ ID NO:281 

MPERLPAAITVLKLAGCCLLASWATALTFPFAGGLGLMSNRASEWANGSAQLLEGQVPAVSTMVD 
5 AKGNTIAWLYSQRRFEVPSDKIANTMKU^IVSIEDKRFADHSGVDWKGTLTGLAGYASGDLDTRGGS 
TLEQQWKNYQLLVTAQTDAEKRAAVETTPARKLREIR^MLTLDKTFTKSEILTRYLNLVSFGNNSFG 
VQDAAQTYFGINASDLNWQQAALLAGMVQSTSTLNPYTNPDGALARRNNA/LDTMIENLPGEAEALR 
AAKAEPLGVLPQPNELPRGCIAAGDRAFFCDYVQEYLSRAGISKEQVATGGYLIRTTLDPEVQAPVKA 
AIDKYASPNLAGISSVMSVIKPGKDAHKVLA^SNRKYGLDLEAGETMRPQPFSLVGDGAGSIFKIFT 

10 TAAALDMGMGINAQLDVPPRFQAKGLGSGGAKGCPKETWCWNAGNYRGSMNVTDALATSPNTAF 
AKLISQVGVGRAVDMAIKLGLRSYANPGTARDYNPDSNESLADFVKRQNLGSFTLGPIELNALELSNV 
MTLASGGVWCPPNPIDQLIDRNGNEVAVTTETCDQWPAGLANTLANAMSKDAVGSGTAAGSAGA 
AGWDLPMSGKTGTTEAHRSAGFVGFTNRYAAANYIYDDSSSPTDLCSGPLRHCGSGDLYGGNEPS 
RTWFAAMKPIANNFGEVQLPPTDPRYVDGAPGSRVPSVAGLDVDAARQRLKDAGFQVADQTNSVN 

15 SSAKYGEWGTSPSGQTIPGSIVTIQISNGIPPAPPPPPLPEDGGPPPPVGSQWEIPGLPPITIPLLAP 
PPPAPPP 

>Rv3721c dnaZX DNA polymerase lll.fgamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 
MW:61892 SEQ ID NO:282 

VALYRKYRPASFAEWGQEHVTAPLSVALDAGRINHAYLFSGPRGCGKTSSARILARSLNCAQGPTA 
20 NPCGVCESCVSU^PNAPGSIDWELDAASHGGVDDTRELRDRAFYAPVQSRYRVFIVDEAHMVTTA 
GFNALLKIVEEPPEHLIFIFATTEPEKVLPTIRSRTHHYPFRLLPPRTMRALLARICEQEGVWDDAVYP 
LVIRAGGGSPRDTLSVLDQLLAGAADTHVTYTRALGLLGWDVALIDDAVDALAACDAAA^ 
DGGHDPRRFATDLLERFRDLIVLQSVPDAASRGWDAPEDALDRMREQAARIGRATLTRYAEWQA 
GLGEMRGATAPRLLLEWCARLLLPSASDAESALLQRVERIETRLDMSIPAPQAVPRPSAAAAEPKHQ 
25 PAREPRPVI^PTPASSEPWAAVRSMWPWRDIWRLRSRTTEVMI^GAWRALEDNTLVLTHESAPL 
ARRLSEQRNADVLAEALKDALGVNWRVRCETGEPAAAASPVGGGANVATAKAVNPAPTANSTQRD 
EEEHMLAEAGRGDPSPRRDPEEVALELLQNELGARRIDNA 
>Rv3783 - TB.seq 4229255:4230094 MW:32337 SEQ ID NO:283 

MTFMDAQASFQTQSRTLARVRGDLVDGFRRHELWLHLGWQDIKQRYRRSVLGPFWITIATGTTAVA 
30 MGGLYSKLFRLELSEHLPYVTLGLIVWNLINAAILDGAEVFVANEGLIKQLPAPLSVHWRLVW 

FAH N I VI YFVI Al I FPKPWSWADLSFLPALALIFLNCVWVSLCFGI LATRYRDI GPLLFSWQLLFFMTPI I 

WNDETLRRQGAGRWSSIVELNPLLHYLDIVRAPLLGAHQELRHWLWLVLTWGWMLAAFAMRQYR 

ARVPYWV 

>Rv3789 - TB.seq 4235371:4235733 MW:13378 SEQ ID NO:284 
35 MRFNATTGGLAGIVDFGLYWLYIWAGLQVDLSKAISFIVGTITAYLINRRVVTFQAEPSTARFVAVM 
GITFAVQVGLN HLCLALLH YRAWAI PVAFVI AQGTATVI N Fl VQRAVI FRI R 
>Rv3790 -TB.seq 4235776:4237158 MW:501 64 SEQ ID NO:285 
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MLSVGATTTATRLTGWGRTAPSVANVLRTPDAEMIVKAVARVAESGGGRGAIARGLGRSYGDNAQN 
GGGLVIDMTPLNTIHSIDADTKLVDIDAGVNLDQLMKAALPFGLVVVPVLPGTRQVTVGGAIACDI^ 
NHHSAGSFGNHVRSMDLLTADGEIRHLTPTGEDAELFWATVGGNGLTGIIMRATIEMTPTSTAYFIAD 
GDWASLDETIALHSDGSEARYTYSSAWFDAISAPPKLGRAAVSRGRLATVEQLPAKLRSEPLKFDAP 
5 QLLTLPDVFPNGU^NKYTFGPIGELmRKSGTYRGKVQNLTQFYHPLDMFGEWNRAYGPAGFLQY^ 
FVIPTEAVDEFKKIIGV1QASGHYSFLNVFKLFGPRNQAPLSFPIPGWNICVDFPIKDGLGKFVSELDRR 
VLEFGGRLYTAKDSRTTAETFHAMYPRVDEWISVRRKVDPLRVFASDMARRLELL 
>Rv3791 - TB.seq 4237162:4237923 MW:27470 SEQ ID NO:286 
MVLDAVGNPQTVLLLGGTSEIGLAICERYLHNSAARIV^ 
10 DALDTDSHPKMIEAAFSGGDVDVAIVAFGLLGDAEELWQNQRKAVQIAEtNYTAAVSVGVLLAEKMR 
AQGFGQIIAMSSAAGERVRRANFWGSTKAGLDGFYLGLSEALREYGVRVLVIRPGQVRTRMSAHLK 
EAPLTVDKEYVANLAVTASAKGKELVWAPAAFRYVMMVLRHIPRSIFRKLPI 
>Rv3794 embA TB.seq 4243230:4246511 MW:1 16694 SEQ ID NO:287 

VPHDGNERSHRIARLAAWSGIAGLLLCGIVPLLPVNQTTATIFWPQGSTADGNITQITAPLVSGAPRA 
1 5 LDISIPCSAIATLPANGGLVLSTLPAGGVDTGKAGLFVRANQDTNAA/AFRDSVAAVAARSTIAAGGCS 

ALHIWADTGGAGADFMGIPGGAGTLPPEKKPQVGGIFTDLKVGAQPGLSARVDIDTRFITTPGALKKA 

VMLLGVLAVLVAMVGLAALDRLSRGRTLRDWLTRYRPRVRV^ 

DDGYLLTVARVAPKAGWANYYRYFGTTEAPFDWYTSVLAQLAAVST^ 

SRFVLRRLGPGPGGLASNRVAVFTAGAWLSAVV^PFNNGLRPEPLI^^ 
20 AAVAI I VATLTATLAPQGLI ALAPLLTGARAI AQRI RRRRATDGLLAPLAVLAAALSLITVWFRDQTLATV 

AESARIKYKVGPTIAWYQDFLRYYFLTVESNVEGSMSRRFAVLVLLFCLFGVLFVLLRRGRVAGLASG 

PAWRLIGTTAVGLLLLTFTPTKWAVQFGAFAGLAGVLGAVTAFT^ 

WATSGINGWFWGNYGVPmDIQPVIASHPWSMFLTLSILTGLLAAmHFRMDYAGHTEVKDNRR 
NRII^STPLLWAVIMVAGEVGSMAKAAVFRYPLYTTAKANLTALSTGLSSCAMADDVLAEPDPNAGM 

25 LQPVPGQAFGPDGPLGGISPVGFKPEGVGEDLKSDPWSKPGLVNSDASPNKPNAAITDSAGTAGG 
KGPVGINGSHAALPFGLDPARTPN^GSYGENNLAATATSAWYQLPPRSPDRPLVWSAAGAIWSYK 
EDGDFIYGQSLKLQWGVTGPDGRIQPLGQVFPIDIGPQPAWRNLRFPLAWAPPEADVARIVAYDPNL 
SPEQWFAFTPPRVPVLESLQRLIGSATPVLMDIATAANFPCQRPFSEHLGIAELPQYRILPDHKQTAA 
SSNLWQSSSTGGPFLFTC^LLRTSTIATYLRGDVVYRDWGSVEQYHRLVPADCMPDAWEEGVITVP 

30 GWGRPGPIRALP 

>Rv3795 embB TB.seq 4246511:4249804 MW:1 18023 SEQ ID NO:288 

MTQCASRRKSTPNRAILGAFASARGTRWVATIAGLIGFVLSVATPLLPWQTTAMLDWPQRGQLGSV 
TAPLISLTPVDFTATVPCDWRAMPPAGGWLGTAPKQGKDANLQALFVWSAQRVDVTDRNWILS 
VPREQVTSPQCQRIEVTSTHAGTFANFVGLKDPSGAPLRSGFPDPNLRPQIVGVFTDLTGPAPPGLA 
35 VSATIDTRFSTRPTTLKLLAIIGAIVATWALIALWRLDQLDGRGSIAQLLLRPFRPASSPGGMRRLIPAS 
WRTFTLTDAWIFGFLLWHVIGANSSDDGYILGMARVADHAGYMSNYFRWFGSPEDPFGWYYNLLA 
LMTHVSDASLWMRLPDLAAGLVCWLLLSREVLPRLGPAVEASKPAYWAAAMVLLTAWMPFNNGLR 
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PEGIIALGSLVTWLIERSMRYSRLTPAAI^WTAAFTLGVQ 

GTLPLVSPMLAAGWILTWFADQTLSTVLEAT^^ 

FLITALCLFTAVFIMLRRKRIPSVARGPAWRLMGVIFGTM^ 

TVLVSPSVLRWSRNRMAFUVALFFLU^LCWATTNGVVWWSSYGVPFNSAMPKIDGIW 
5 MGYMWLHFAPRGAGEGRLIRALTTAPVPIVAGFMAAVFVASMVAGIVRQYPTYSNGWSNVRAFV 
GGCGLADDVLVEPDTNAGFMKPLDGDSGSWGPLGPLGGVNPVGFTPNGVPEHTVAEAIVMKPNQP 
GTDYDWDAPTKLTSPGINGSWPLPYGLDPARVPLAGTYTTGAQQQSTLVSAWYLLPKPDDGHPLV 
NA/TAAGKIAGNSVLHGYTPGQTWLEYAMPGPGALVPAGRMVPDDLYGEQPKAWRNLRFARAKMP 
ADAVAVRWAEDLSLTPEDWIAWPPRVPDLRSLQEWGSTQPVLLDWAVGLAFPCQQPMLHANGIA 
10 EIPKFRITPDYSAKKLDTDTWEDGTNGGLLGITDLLLRAHVMATYLSRDWARDWGSLRKFDTLVDAP 
PAQLELGTATRSGLWSPGKI RIGP 

>Rv3834c serS seryl-tRNA synthase TB.seq 4307655:430891 1 MW:45293 SEQ ID NO:289 

vidlkllrenpdavrrsqlsrgedpalvdalltadaarravistadslraeqkaasksvggaspeerp 
pllrrakei^eqvkaaeadeveaeaaftaahi^isiwivdgvpaggeddyavldwgepsylenpkd 
15 hlelgeslglidmqrgakvsgsrfyfltgrgallqlgllqlauoavdngfvptippvlvrpevmvgt 
gflgahaeevyrvegdglylvgtsevplagyhsgeildlsrgplryagwsscfrreagshgkdtrg 
iirvhqfdkvegfwctpadaeheherllgwqrqmlarievpyrvidvaagdlgssaarkf[x:eawi 

PTQGAYRELTSTSNCTTFQARRLATRYRDASGKPQIMTLNGTLATTRWLVAILENHQRPDGSVRVP 
DALVPFVGVEVLEPVA 

20 >Rv3907c pcnA polynucleotide polymerase TB.seq 4391631:4393070 MW:53057 SEQ ID NO:290 
VPEAVQEADLLTAAAVALNRHAALLRELGSVFAAAGHELYLVGGSVRDALLGRLSPDLDFTTDARPE 
RVQEIVRPWADAVWDTGIEFGTVGVGKSDHRMEITTFRADSYDRVSRHPEVRFGDCLEGDLVRRDF 
TTNAMAVRVTATGPGEFLDPLGGL^ALRAKVLDTPAAPSGSFGDDPLRMLRAARFVSQLGFAVAPR 
VRAAIEEMAPQI^RISAERVAAELDKLLVGEDPAAGIDLMVQSGMGAWLPEIGGMRMAIDEHHQHK 

25 DWQHSLTVLRQAIALEDDGPDLVLRWAALLHDIGKPATRRHEPDGGVSFHHHEWGAKMVRKRMR 
ALKYSKQMIDDISQLWLHLRFHGYGDGKmDSAVRRYWDAGALLPRLHKLVF^DCTTRNKRR^ 
LQASYDRLEERIAELAAQEDLDRVRPDLDGNQIMAVLDIPAGPQVGEAWRYLKELRLERGPLSTEEA 
TTELLSWWKSRGNR 

30 A number of embodiments of the invention have been described. Neverthe- 

less, it will be understood that various modifications may be made without departing from 
the spirit and scope of the invention. Accordingly, other embodiments are within the scope 
of the following claims. 

35 
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WHAT IS CLAIMED IS: 

1 . A method for identifying a nucleic acid or a polypeptide sequence that 
may be a target for a drug comprising the following steps: 

(a) providing a first nucleic acid or a polypeptide sequence that is known to 
5 be a drug target; 

(b) providing at least one algorithm selected from the group consisting of a "domain 
fusion" method, a "phylogenetic profile" method and a "physiologic linkage" method, 
wherein the algorithm is capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences; and 
10 (c) comparing the first nucleic acid or the polypeptide drug target sequence to a 

plurality of sequences using at least one of the algorithms as set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence, thereby identifying a 
nucleic acid or a polypeptide sequence that may be a target for a drug . 

15 2. A method for identifying a nucleic acid or a polypeptide sequence that 

may be essential for the growth or viability of an organism comprising the following steps: 
(a) providing a first nucleic acid or a polypeptide sequence that is known to 
be essential for the growth or viability of an organism; 

(b) providing at least one algorithm capable analyzing a functional relationship 
20 between nucleic acid or polypeptide sequences selected from the group consisting of a 

"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 

(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms as set forth in step (b) to identify a second 

25 sequence that has a functional relationship to the first sequence, thereby identifying a nucleic 
acid or a polypeptide sequence that may be essential for the growth or viability of an 
organism. 

3. The method of claim 1 or claim 2, wherein the drug is an anti- 
30 microbial drug. 
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4. The method of claim 1 or claim 2, wherein the first nucleic acid or a 
polypeptide sequence is derived from a pathogen. 

5. The method of claim 4, wherein the pathogen is a microorganism. 



6. The method of claim 1 
Mycobacterium tuberculosis (MTB). 

7. The method of claim 1 
used to identify a second sequence comprises 
genome of an organism. 

8. The method of claim 1 
used to identify a second sequence comprises 
a pathogen. 



or claim 2, wherein the microorganism is 

or claim 2, wherein the plurality of sequences 
a database of the gene sequences of an entire 

or claim 2, wherein the plurality of sequences 
a database of the gene sequences derived from 



9. The method of claim 1 or claim 2, wherein the "phylogenetic profile" 
method algorithm comprises 

(a) obtaining data, comprising a list of proteins from at least two genomes; 

(b) comparing the list of proteins to form a protein phylogenetic profile for 
each protein, wherein the protein phylogenetic profile indicates the presence or absence of a 
protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and 

(c) grouping the list of proteins based on similar profiles, wherein proteins 
with similar profiles are indicated to have a functional relationship. 

10. The method of claim 9, wherein the phylogenetic profile is in the form 
of a vector, matrix or phylogenetic tree. 

1 1 . The method of claim 9, comprising determining the significance of 
homology between the proteins by computing a probability (p) value threshold. 
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12. The method of claim 1 1 , wherein the probability is set with respect to 
the value 1/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in the first organism's genome and M in all 

5 other genomes. 

1 3 . The method of claim 9, wherein the presence or absence is by 
calculating an evolutionary distance. 

10 14. The method of claim 13, wherein the evolutionary distance is 

calculated by: 

(a) aligning two sequences from the list of proteins; 

(b) determining an evolution probability process by constructing a conditional 
probability matrix: p(aa— >aa 5 ), where aa and aa' are any amino acids, said conditional 

15 probability matrix being constructed by converting an amino acid substitution matrix from a 
log odds matrix to said conditional probability matrix; 

(c) accounting for an observed alignment of the constructed conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 

during the alignment of the two sequences, represented by P(p)= Y\ P( aa » -* aa%n ) \ and 

ji 

20 (d) determining an evolutionary distance a from powers equation 

p'=p a (aa— >aa'), maximizing for P. 

15. The method of claim 14, wherein the conditional probability matrix is 
defined by a Markov process with substitution rates, over a fixed time interval. 



25 



1 6. The method of claim 1 4, where the conversion from an amino acid 
substitution matrix to a conditional probability matrix is represented by: 

BLOSUM62ij 
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where BLOSUM62 is an amino acid substitution matrix, and P(i->j) is the 
probability that amino acid i is replaced by amino acid j through point mutations according to 
BLOSUM62 scores. 

17. The method of claim 16, where Pfs are the abundances of amino acid 
5 j and are computed by solving a plurality of linear equations given by the normalization 

condition that: 

5>(/->./) = l. 

i 

18. The method of claim 1 or claim 2, wherein the "physiologic linkage" 
10 method algorithm identifies proteins and nucleic acids that participate in a common 

functional pathway. 

1 9. The method of claim 1 or claim 2, wherein the physiologic linkage" 
method algorithm comprises identifies proteins and nucleic acids that participate in the 

15 synthesis of a common structural complex. 

20. The method of claim 1 or claim 2, wherein the "physiologic linkage" 
method algorithm comprises identifies proteins and nucleic acids that participate in a 
common metabolic pathway. 

20 

2 1 . The method of claim 1 or claim 2, wherein the "domain fusion" 
method algorithm comprises 

(a) aligning a first primary amino acid sequence of multiple distinct non-homologous 
polypeptides to second primary amino acid sequence of a plurality of proteins; and 
25 (b) for any alignment found between the first primary amino acid sequences of all of 

such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. 

30 
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22. The method of claim 2 1 , wherein the aligning is performed by an 
algorithm selected from the group consisting of a Smith- Waterman algorithm, Needleman- 
Wunsch algorithm, a BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. 

5 23. The method of claim 21 , wherein the multiple distinct non- 

homologous polypeptides are obtained by translating a nucleic acid sequence from a genome 
database. 

24. The method of claim 2 1 , wherein the plurality of proteins have a 
10 known function. 

25. The method of claim 21, wherein at least one of the multiple distinct 
non-homologous polypeptides has a known function. 

1 5 26. The method of claim 2 1 , wherein at least one of the multiple distinct 

non-homologous polypeptides has an unknown function. 

27. The method of claim 21, wherein the alignment is based on the degree 
of homology of the multiple distinct non-homologous polypeptides to the plurality of 

20 proteins. 

28. The method of claim 2 1 , further comprising determining the 
significance of the aligned and identified second primary amino acid sequence by computing 
a probability (p) value threshold. 

25 

29. The method of claim 28, wherein the probability threshold is set with 
respect to the value 1/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in a first organism's genome arid M in all 
other genomes. 

30 

30. The method of claim 21, further comprising filtering excessive 

functional links between one first primary amino acid sequence of multiple distinct non- 
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homologous polypeptides and an excessive number of other distinct non-homologous 
polypeptides for any alignment found between the first primary amino acid sequences of the 
distinct non-homologous polypeptides and at least one of the second primary amino acid 
sequences of the plurality of proteins. 

31. A computer program product, stored on a computer-readable medium, 
for identifying a nucleic acid or a polypeptide sequence that may be a target for a drug, the 
computer program product comprising instructions for causing a computer system to be 
capable of: 

(a) inputting a first nucleic acid or a polypeptide sequence that is known to be 

a drug target; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 

(c) comparing the first nucleic acid or the polypeptide drug target sequence to 
a plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence and generating an 
output identifying a nucleic acid or a polypeptide sequence that may be a target for a drug . 

32. A computer program product, stored on a computer-readable medium, 
for identifying a nucleic acid or a polypeptide sequence that may be essential for the growth 
or viability of an organism, the computer program product comprising instructions for 
causing a computer system to be capable of: 

(a) providing a first nucleic acid or a polypeptide sequence that is known to 
be essential for the growth or viability of an organism; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 
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(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms set forth in step (b) to identify a second 
sequence that has a functional relationship to the first sequence and generating an output 
identifying a nucleic acid or a polypeptide sequence that may be essential for the growth or 
viability of an organism. 

33. A computer system, comprising: 

(a) a processor; and 

(b) a computer program product as set forth in claim 3 1 or claim 32. 
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Figure 1 
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Figure 2 
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Figure 3 
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Figure 4 
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Figure 5 
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